Skip to content

Latest commit

 

History

History
16 lines (11 loc) · 515 Bytes

README.md

File metadata and controls

16 lines (11 loc) · 515 Bytes

PyData Berlin Meetup

PyData Meetup 11/11/2014

Slides and Python notebook of the talk on PySpark.

To get started download Spark, set the environment variables and start the notebook.

export SPARK_HOME="<path_to_spark>";                                                                                        
export PYTHONPATH="$SPARK_HOME/python/:$PYTHONPATH";  

The data is a small sub-sample of the original dataset: "Amazon movie reviews": http://snap.stanford.edu/data/web-Movies.html