This project predicts wine quality using Machine Learning techniques based on chemical properties. It explores two primary approaches:
- Regression: Predicts wine quality score (0-10).
- Classification: Categorizes wine quality as low, medium, or high.
The Wine Quality Dataset from Kaggle was used. Access it here.
The dataset includes chemical properties like:
- Alcohol percentage
- pH
- Acidity levels
- Sulfur dioxide content
- No missing values.
- Balanced dataset by adding samples for underrepresented classes.
- Normalized skewed distributions with log transformations.
- Derived Metrics:
Alcohol-Sugar Interaction
:alcohol × residual sugar
Total Acidity
:fixed acidity + volatile acidity + citric acid
Total Sulfur
:free sulfur dioxide + total sulfur dioxide
- Handles complex relationships and reduces overfitting.
- Hyperparameters: Tuned using
GridSearchCV
.max_depth=10
min_samples_split=2
min_samples_leaf=1
- Optimized for large datasets with gradient-boosted trees.
- Hyperparameters: Tuned using
Optuna
.subsample=0.7449
learning_rate=0.0309
max_depth=11
- Classification (Random Forest):
- Accuracy: 85.62%
- F1 Score: 0.70807
- Regression (XGBoost):
- R² Score: 0.6028
- Mean Squared Error (MSE): 0.3350
Key predictors included Alcohol and Total Acidity, as identified through feature importance analysis.
git clone https://github.com/ahmetbekir22/wine-quality-prediction
cd wine-quality-predictor