GitHub - Svetixbot/data-workshop

Installation instructions

Download Spark - Prebuild for Hadoop 2.6
Try to run

Scala api

$YOUR_SPARK_PATH/bin/spark-shell

Python api

$YOUR_SPARK_PATH/bin/pyspark

May need to install python or java

Once it is loaded you will see scala console:

scala>

Type 'sc' and press enter, you should see:

scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@1f60824e

Start spark locally

Start spark locally.

1.1. Start the master $YOUR_SPARK_PATH/sbin/start-master.sh

1.2. After couple of minutes the http://localhost:8080 will be available. It is a spark ui.

1.3. Start workers(as many as you like, 2 or 3):

$YOUR_SPARK_PATH/bin/spark-class org.apache.spark.deploy.worker.Worker $SPARK_MASTER_URL &

where $SPARK_MASTER_URL is in the [web ui](http://localhost:8080)

Submit a test job

$YOUR_SPARK_PATH/bin/spark-submit --master $SPARK_MASTER_URL job.py $SPARK_MASTER_URL

Exersices

References

Designing Data-Intensive Applications

slides

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
job.py		job.py
lorem.txt		lorem.txt
olympic.csv		olympic.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation instructions

Scala api

Python api

May need to install python or java

Start spark locally

Exersices

#1 Learn about RDD

#2 More challenges

#3 Set up a cluster on EC2

References

About

Releases

Packages

Languages

Svetixbot/data-workshop

Folders and files

Latest commit

History

Repository files navigation

Installation instructions

Scala api

Python api

May need to install python or java

Start spark locally

Exersices

#1 Learn about RDD

#2 More challenges

#3 Set up a cluster on EC2

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages