A full data analytics project that Review of Big Data Analytic Methods and cluster the Household Income using the K-means algorithm.
-
Preprocessing:
-
[1] Analyzing the data.
-
[2] Detecting the duplicate rows of data and removing them.
-
[3] Detecting the outlier values of data and removing them.
-
[4] Visualizing the scatter and box plots.
-
Advanced Analytics Methods:
-
[1] Applying K-means algorithm in the data.
-
[2] Determine the best value of K using the
elbow plot
. -
[3] Scale the data by
Log10
to improve the clustering result.
- [1] Install R programming language.
- [2] Install R Studio.
- [3] Install the needed packages.
install.packages("dplyr")
install.packages("data.table")
install.packages("Hmisc")
install.packages("ggpubr")
install.packages("factoextra")
install.packages("ClusterR")
install.packages("cluster")