Skip to content

Latest commit

 

History

History
69 lines (45 loc) · 1.5 KB

README.md

File metadata and controls

69 lines (45 loc) · 1.5 KB

Installation instructions

  1. Download Spark - Prebuild for Hadoop 2.6

  2. Try to run

Scala api

$YOUR_SPARK_PATH/bin/spark-shell

Python api

$YOUR_SPARK_PATH/bin/pyspark

May need to install python or java

  1. Once it is loaded you will see scala console:
scala>

Type 'sc' and press enter, you should see:

scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@1f60824e

Start spark locally

  1. Start spark locally.

    1.1. Start the master $YOUR_SPARK_PATH/sbin/start-master.sh

    1.2. After couple of minutes the http://localhost:8080 will be available. It is a spark ui.

    1.3. Start workers(as many as you like, 2 or 3):

$YOUR_SPARK_PATH/bin/spark-class org.apache.spark.deploy.worker.Worker $SPARK_MASTER_URL &

where $SPARK_MASTER_URL is in the [web ui](http://localhost:8080)

  1. Submit a test job
$YOUR_SPARK_PATH/bin/spark-submit --master $SPARK_MASTER_URL job.py $SPARK_MASTER_URL

Exersices

#1 Learn about RDD

Let's go through the official tutorial

#2 More challenges

Here

#3 Set up a cluster on EC2

Here

References

Designing Data-Intensive Applications

slides