This project aims to predict the risk of cardiac disease using machine learning techniques. The dataset used in this project is the Heart Attack dataset from the UCI repository. The project includes data preprocessing, exploratory data analysis, and the implementation of multiple machine learning algorithms to build models for prediction. The models are compared and evaluate using various metrics.
The Heart Attack dataset used in this project is obtained from the UCI Machine Learning Repository. It contains information about patients who have had a heart attack within the past two years. The dataset includes 14 attributes and 303 observations.
- Data preprocessing
- Exploratory Data Analysis
- Machine Learning Algorithms
- Model Evaluation using Basic Metrics
- Accuracy, Precision, Recall, F1 Score or Classification Report
- Confusion Matrix
- Hyperparameter Tuning using Grid Search
- Model Evaluation using Advanced Metrics
- Basic Metrics or Classification Report
- Confusion Matrix
- k-Fold Cross Validation
- ROC
- PRC
- Learning Curve
- Feature Importance
Supervised Learning Algorithms for Classification:
- Logistic Regression
- K-Nearest Neighbors
- Support Vector Machine
- Decision Tree
- Random Forest
- AdaBoost
- Gradient Boosting
- Naive Bayes
- Ensemble (experiment)
It was found that Random Forest performed the best with an accuracy score of 95%.
To run this project, you will need to have Python-3 and the following libraries installed:
- pandas
- numpy
- matplotlib
- seaborn
- sklearn
The code for this project is provided in the Jupyter notebook format. The notebook is well-commented and easy to follow along.
Kruhti S B
- Email: [email protected]
- Linkedin: https://www.linkedin.com/in/kruthi-s-b-358956222/
This project is licensed under the MIT License - see the LICENSE.md file for details.