This project explores the application of Python-based Machine Learning and Data Science libraries to create a predictive model for heart diseases based on medical attributes.
Given clinical parameters about a patient, can we predict whether or not they have heart disease?
The original data is sourced from the Cleveland data from the UCI Machine Learning Repository and is also available on Kaggle.
The project aims to achieve a minimum of 95% accuracy in predicting heart disease during the proof of concept.
- age: Age of the individual.
- sex: Gender of the individual (1 = male, 0 = female).
- cp: Chest-pain type (0 = typical angina, 1 = atypical angina, 2 = non-anginal pain, 3 = asymptotic).
- trestbps: Resting Blood Pressure (mmHg) with values above 130-140 typically indicating concern.
- chol: Serum Cholesterol (mg/dl).
- fbs: Fasting Blood Sugar (1 = true, 0 = false) with levels above 120mg/dl indicating diabetes.
- restecg: Resting ECG (0 = normal, 1 = having ST-T wave abnormality, 2 = left ventricular hypertrophy).
- thalach: Max heart rate achieved.
- exang: Exercise-induced angina (1 = yes, 0 = no).
- oldpeak: ST depression induced by exercise relative to rest.
- slope: Slope of the peak exercise ST segment (0 = upsloping, 1 = flat, 2 = downsloping).
- ca: Number of major vessels (0–3) colored by fluoroscopy.
- thal: Displays the thalassemia (1, 3 = normal, 6 = fixed defect, 7 = reversible defect).
- target: Indicates whether the individual has heart disease (1 = yes, 0 = no).
The data is split into training and test sets, and various Machine Learning models will be used to build the predictive model. These models include:
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Random Forest Classifier
Evaluation metrics will be used to assess the performance of each model, including accuracy, precision, recall, and F1-score.