Iris Dataset Analysis This project demonstrates the application of basic machine learning techniques on the classic Iris dataset. The Iris dataset contains 150 samples of iris flowers, categorized into three species: Setosa, Versicolor, and Virginica. Each sample has four features: sepal length, sepal width, petal length, and petal width.
Table of Contents Overview Installation Usage Models Implemented Results Contributing License Overview This Python script uses the Iris dataset to demonstrate the following:
Data Preprocessing using pandas and scikit-learn. Model training using various classifiers: Logistic Regression K-Nearest Neighbors (KNN) Support Vector Machine (SVM) Performance evaluation with metrics like accuracy score and confusion matrix. Visualization using matplotlib and seaborn. Installation To run this project, make sure you have the required dependencies. You can install them using the following command:
bash Copy code pip install numpy pandas scikit-learn matplotlib seaborn Usage To execute the analysis, run the script:
bash Copy code python iris_dataset.py Ensure that the Iris.csv file is in the same directory or provide the correct path to the dataset within the script.
Models Implemented The script includes the implementation of the following machine learning algorithms:
Logistic Regression: A simple linear classifier used for binary or multiclass classification. K-Nearest Neighbors (KNN): A non-parametric method that predicts the class of a data point based on the majority class of its nearest neighbors. Support Vector Machine (SVM): A powerful classifier that finds the hyperplane that best separates different classes. Results The performance of each model is evaluated using:
Accuracy Score: Measures the percentage of correct predictions. Confusion Matrix: Shows the distribution of correct and incorrect predictions for each class. Contributing Feel free to open issues or submit pull requests for improvements or additional features.