Skip to content

eilidh-t/DataScienceCourse

 
 

Repository files navigation

DataScienceCourse

This notebook shows how to implement k-means clustering in Spark.

This example requires an installation of Spark at the location $SPARK_HOME and it uses sklearn for creating a random dataset. Then start PySpark with the notebook as follows:

PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook $SPARK_HOME/bin/pyspark

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%