This project utilizes a machine learning model to predict the win/loss outcome of the batting team in the second innings of IPL matches, based on historical data from IPL seasons 2008 to 2023.
The data used for this project was obtained from a public dataset available on Kaggle, titled IPL 2008 to 2023 dataset, contributed by user Sri tata. The dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. We acknowledge Sri tata and further recognize cricsheet.com as a potential source of the raw data based on the contributor's note.
Two machine learning models were developed for this project:
- Random Forest Classifier: Achieved an accuracy of 99.87%.
- Logistic Regression: Achieved an accuracy of 80.38%.
The models predict the win/loss probability for the batting team during the run chase in the second innings of IPL matches. After analysis, we selected Logistic Regression as the final model due to its ability to provide probability percentages with more engaging and interpretable figures for users, especially cricket fans. The Random Forest model, while accurate, produced more extreme predictions.
Fig. 1: Illustration of the end-to-end algorithm pipeline used in the project.
Fig. 2: Visualization of the win/lose probabilities for the chasing team on an over-by-over basis.
Fig. 3: Graphical representation of the Random Forest model predictions for the chase scenario, over by over.
Below are images representing the results and functioning of the Random Forest Algorithm:
Below are images representing the results and functioning of the Logistic Regression Algorithm:
Although the Random Forest model achieved higher accuracy, the Logistic Regression model was chosen as the final model due to its ability to present win/loss probabilities in a more interpretable manner, making it more engaging for cricket fans.
A video demonstration of the project has been attached.
This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.