PyData Meetup 11/11/2014
Slides and Python notebook of the talk on PySpark.
To get started download Spark, set the environment variables and start the notebook.
export SPARK_HOME="<path_to_spark>";
export PYTHONPATH="$SPARK_HOME/python/:$PYTHONPATH";
The data is a small sub-sample of the original dataset: "Amazon movie reviews": http://snap.stanford.edu/data/web-Movies.html