.......DRIVERS OF CHANGE: Spark Installation to Mac

Saturday, January 30, 2016

Spark Installation to Mac

Last week, during my complex networks winter school, I installed the spark to my Mac. It is a very easy process but there is an error that I want to share.

Firstly, to introduce Spark, it is an engine for large scale data processing. Big data is an example for that. You can code in R, SQL, Python, Scala or Java. Our reason to install Spark was to check GraphX. GraphX is an embedded graph processing framework that is built on the top of Apache Spark.
The main property of GraphX is that data tables and graphs can be used interchangeably. In other words, at any time point, a table can be threaten as a vertex of the graph.
There is two collection which is vertex and edge collection. Mapping and reduce operations are done and made of stages. The drawback is the memory. It has a history preserving method to ignore recomputation of previously done computations.

Ok. So for the installation,
Download the source from Spark.
Ensure Java Home is set.

To set

Then build it with by executing make_distribution.sh
During build operation I came up with this error.

There are some proposed solutions in StackOverflow.

http://stackoverflow.com/questions/35022961/how-do-i-use-mvn-dependencyget-to-use-https-repo1-maven-org

https://coderwall.com/p/zr6bga/stopping-maven-from-trying-to-access-its-central-repository

http://maven.apache.org/ref/3.0.4/maven-model-builder/super-pom.html

Each suggests to modify pom.xml with different repository url from repo.maven to repo1.maven. But my referenced repository was already repo1.maven

So, I changed repo1 url to https://repo.maven.apache.org/maven2

. Then my professor suggested me to build it with a reliable internet connection and Voila! It works. So, since the build operation takes quite a long time, you can take into account that the connection can cause problem.

After getting build success message ,

Ensure localhost keyword is exist in the conf/slaves file.

Lastly start your cluster by executing . / start-all.sh which is located under /sbin folder.

The system will start the master and worker.

The web interface must be available at http://localhost:8080