DataScienceCourse

This notebook shows how to implement k-means clustering in Spark.

This example requires an installation of Spark at the location $SPARK_HOME and it uses sklearn for creating a random dataset. Then start PySpark with the notebook as follows:

PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook $SPARK_HOME/bin/pyspark

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
Spark k-means walkthrough - cluster version without numpy.ipynb		Spark k-means walkthrough - cluster version without numpy.ipynb
Spark k-means walkthrough.ipynb		Spark k-means walkthrough.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataScienceCourse

About

Releases

Packages

Languages

License

eilidh-t/DataScienceCourse

Folders and files

Latest commit

History

Repository files navigation

DataScienceCourse

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages