This repository contains an implementation of a K-Nearest Neighbors (KNN) model for two different tasks: digit classification and abalone age prediction.
The dataset used for digit classification is the well-known MNIST dataset, which contains 70,000 grayscale images of handwritten digits, each 28x28 pixels in size. The dataset is split into 60,000 training images and 10,000 test images.
The dataset used for abalone age prediction is the Abalone dataset, which contains physical measurements of abalone shells and the corresponding age of the abalone. The goal is to predict the age of the abalone based on the physical measurements.
Implementation The KNN model is implemented using Python and the scikit-learn library. The code for digit classification is in the K_nearest_neighbors_(DIGITS_CLASSIFICATION 0-9).ipynb file, and the code for abalone age prediction is in the K_nearest_neighbors_(ABALONE_AGE_PREDICTION).ipynb file.
K-Nearest Neighbors (KNN) is a simple but effective machine learning algorithm used for classification and regression tasks. In the case of classification, the KNN model calculates the distance between a new data point and all the training data points, and then assigns the new data point to the class that is most common among the K nearest training data points. In the case of regression, the KNN model predicts the target value of a new data point by averaging the target values of the K nearest training data points.
The KNN model has a few important hyperparameters that can be tuned for better performance. The most important hyperparameter is K, which determines the number of nearest neighbors to consider. A higher value of K tends to produce smoother decision boundaries, while a lower value of K tends to produce more complex decision boundaries that fit the training data better. Other hyperparameters include the distance metric used to calculate distances between data points, and the weighting scheme used to assign importance to the nearest neighbors.
In this repository, we use scikit-learn's implementation of the KNN model, which allows us to easily tune these hyperparameters and evaluate the model's performance.
The code requires the following Python libraries:
scikit-learn
numpy
pandas