K-Means-Clustering

A library for K Means Clustering

Clustering

K means clustering is an unsupervised machine learning algorithm. In unsupervised learning, there is no ground truth or labels in the dataset. The clusters or classes are inferred from similar features or structure in the data.

K Means Algorithm

K means is an iterative algorithm to find k optimal centroids to split the data into clusters. Ideally, each cluster contains high correlation among the data points.

The algorithm works by a 2 step iterative process obtained from the Expectation-Maximization (EM) algorithm.

Assign data points to the closest centroid (least euclidean distance)
Recompute the mean of the centroid by averaging over the data points that belong to that centroid.

Initializing the centroids

There are multiple ways of initializing the centroids. In the code, 2 methods are implemented.

Initialize the centroids to k randomly selected data points
Mean-Splitting

The first method assigns k centroids randomly from the dataset. This method disregards the spatial structure of the data while initializing the centroids. The algorithm may get stuck in a local optimum and may not converge to a global optimum. Therefore, different initializations may lead to better results.

The second method accounts for the spatial structure of the data while initializing the centroids. In this method, k means is performed iteratively as we increase k from 1 to k.

For k=1, compute the mean of all the points
For k=2, perturb the mean randomly

$https://latex.codecogs.com/svg.image?\mu_1^2 = \mu_1$

$\mu_2^2 = \mu_1 \epsilon$

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
Media		Media
README.md		README.md
example.py		example.py
kmeans.py		kmeans.py
plot.py		plot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-Means-Clustering

Clustering

K Means Algorithm

Initializing the centroids

Gifs

About

Releases

Packages

Languages

haarisr/K-Means-Clustering

Folders and files

Latest commit

History

Repository files navigation

K-Means-Clustering

Clustering

K Means Algorithm

Initializing the centroids

Gifs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages