Demonstration of the development and analysis of different regression models to predict fuel consumption using various car attributes from the Auto MPG dataset. An answer to a question - what decides how far a car will go?
A full, detailed report of the process in Polish can be read in report.pdf
The project includes data preprocessing, exploratory data analysis, and the implementation of a linear regression model using the following features:
Table 1. Feature coefficient values for LSM-based linear regression
- Reads car data from a file.
- Converts MPG (Miles per Gallon) to liters per 100 km.
- Normalises selected numerical features.
- Feature engineers categorical origin variable into Europe, USA or Asia via one-hot encoding.
- Provides scatter plots and histograms to explore data relationships and distribution.
- Offers correlation heatmaps to understand feature interactions.
- Splits the data into training and testing sets.
- Trains a linear regression model and evaluates its performance using mean squared error (MSE).
- Compares the accuracy of different regression methods: LSM, Lasso, Ridge, ElasticNet.
- Outputs the parameter values for optimal LSM-based linear regression method.
Table 2. Mean Squared Error, intercept and coefficients comparison for different regression types
To run this project, you need Python installed along with the following libraries:
- NumPy
- pandas
- matplotlib
- seaborn
- scikit-learn
You can install these packages using pip:
pip install numpy pandas matplotlib seaborn scikit-learn
To use this project:
- Ensure you have the auto-mpg.data file in your working directory. This file contains the dataset used for modeling.
- Run the script via the command line:
python main.py