KNN_Classifier

KNN is a supervised algorithm that determines a data's label, based on the k nearest training examples in the dataset. This algorithm determines the data is more likely to be in a class by the training data most similar to our test data. In this project, we programmed the KNN algorithm manually. The main focus of this project was to show how KNN classifies data.

Dataset Discription & Functions

dataset
The data we are dealing with has 400 points consisting of their x, y, and labels. Every class in the dataset is shown in a different color. Each point has a corresponding label to separate the classes.
Our data is shown in the diagram below:

Shows an illustrated sun in light color mode and a moon with stars in dark color mode.

Load & split train and test data
With Pandas, you can read a CSV file and load the dataset as a dataframe. For training a KNN model we need both train and test data. We have some data to predict test dataset labels and then evaluate our code. To achieve that we used Scikit Learn's split data function.

Main Functions

distance_calculator()
Machine learning data consists of vectors. KNN algorithm searches for similar training data. The most straightforward function to find the neighbors is the Euclidean distance function.
K_nearest_neighbour_classifier()
By knowing the distance between two points, K nearest neighbors are the training points with the least distance. This function would find the frequent labels in the neighbors and classify the test data point as the same label.
accuracy_calculator()
Because KNN is a supervised algorithm, we have the test labels. The number of correctly classified labels is divided by the total number of data.
data_plot()
The plotter will show the train and test data with their class labels in a 2D coordinate diagram. You can set the inputs for both supervised and predicted labels.

Hyperpatameter Tuning(K)

Setting KNN's hyperparameter is commonly achieved by the elbow chart. For each K from 3 to 20, you can check how different the loss will be and then you can pick the best number for tuning in your scenario.

For our dataset, K=13 has the least loss.

Train and Evaluation

For training the dataset, we have a function to combine previously mentioned functions. We set K as input to set the number of naibours and train our model. This function will predict a label for each test datapoint by the end of training. To test how much the model predicted correctly, we call the accuracy calculator. Our KNN predicts over 88 percent of labels correctly.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Pictures		Pictures
KNN.ipynb		KNN.ipynb
KNN_data_set1.csv		KNN_data_set1.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KNN_Classifier

Table of Contents

Dataset Discription & Functions

Main Functions

Hyperpatameter Tuning(K)

Train and Evaluation

About

Releases

Packages

Languages

KimiyaVahidMotlagh/KNN_Classifier

Folders and files

Latest commit

History

Repository files navigation

KNN_Classifier

Table of Contents

Dataset Discription & Functions

Main Functions

Hyperpatameter Tuning(K)

Train and Evaluation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages