Machine learning models with built-in platform for reading and analyzing tennis results.
The project is part of my BSc Thesis, which compares popular ML algorithms.
Many thanks to Jeff Sackmann's repository, which collects all ATP match data in a neat CSV format.
- Python:
- scikit-learn
- scikit-optimize
- pandas
- numpy
- xgboost
- joblib
- flask
- flask-cors
- Node.js and npm
- Clone this repository:
git clone https://github.com/Ziusz/tennis-predictor.git
- Install python dependencies via pip:
pip install -r requirements.txt
- Initialize Jeff Sackmann's repository:
git submodule update --init --recursive
- Initialize the data preprocessing script:
python scripts/preprocess_data.py
- Initialize the model learning script (It might take several hours ¯_(ツ)_/¯):
python scripts/train_models.py
- If you want to use the web interface, install all frontend dependencies via npm:
cd frontend
npm install
CLI has two commands:
- evaluate, to get stats and info about trained model:
python scripts/cli.py evaluate <model_name>
# example:
python scripts/cli.py evaluate logistic_regression
- predict, to get probability of player1 winning calculated by all models:
python scripts/cli.py predict <p1_hand> <p1_height> <p1_age> <p1_rank> <p1_rank_points> <p2_hand> <p2_height> <p2_age> <p2_rank> <p2_rank_points> <surface> <tourney_level> <tourney_round>
# example:
python scripts/cli.py predict L 180 25.5 14 3500 R 198 21 44 1250 Grass G R32
There is also an optional API that allows to send HTTP requests to models.
To turn on API you need to initialize api script: python api.py
There are 3 endpoints:
- GET: /players, which returns JSON with all players info
- GET: /evaluate/<model_name>, which returns JSON with metrics of chosen model
- POST: /predict, which returns JSON with probability of player1 winning calculated by all models
Example body structure of request:
{ "player1": { "hand": "R", "height": 194, "age": 25, "rank": 11, "rank_points": 4000 }, "player2": { "hand": "L", "height": 178, "age": 19, "rank": 150, "rank_points": 35 }, "surface": "Hard", "tourney_level": "A", "round": "QF" }
API runs on port 7771 by default.
This is a frontend written in Vue.js, so it requires dependencies installed by npm. If you haven't installed them yet, go back to the last section of the installation description.
It requires an initialized API to work.
To turn on frontend you need to initialize serve script:
cd frontend
npm run serve
API runs on port 8080 by default.
If you host the site locally, you can visit the URL http://localhost:8080.
Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) | |
---|---|---|---|---|
Logistic regression | 64.81% | 64.90% | 64.51% | 64.70% |
SVM | 51.04% | 50.55% | 95.81% | 66.18% |
Random Forest | 70.78% | 70.97% | 70.34% | 70.65% |
KNN | 65.95% | 66.41% | 64.54% | 65.46% |
Gradient Boosting | 67.31% | 67.31% | 67.33% | 67.32% |
XGBoost | 66.87% | 66.87% | 66.87% | 66.87% |
Random Forest is definitely the best one at the moment.
SVM has a very high recall rate, suggesting a problem with excessive false classifications.
Hyperparameter | Value |
---|---|
C | 10 |
max_iter | 1203 |
solver | saga |
Hyperparameter | Value |
---|---|
C | 0.12842116071378784 |
coef0 | 0.006382660838799905 |
gamma | auto |
kernel | poly |
max_iter | 825 |
Hyperparameter | Value |
---|---|
criterion | entropy |
max_depth | 15 |
max_features | log2 |
min_samples_leaf | 7 |
min_samples_split | 4 |
n_estimators | 1000 |
Hyperparameter | Value |
---|---|
algorithm | brute |
leaf_size | 20 |
n_neighbors | 98 |
p | 1 |
weights | uniform |
Hyperparameter | Value |
---|---|
learning_rate | 0.03589451606571555 |
loss | log_loss |
max_depth | 4 |
max_features | log2 |
n_estimators | 936 |
subsample | 0.8033808791659816 |
Hyperparameter | Value |
---|---|
alpha | 10 |
colsample_bytree | 0.6758151134813088 |
eval_metric | rmsle |
grow_policy | lossguide |
lambda | 0.4652332574590495 |
learning_rate | 0.16017846170784672 |
max_depth | 3 |
n_estimators | 316 |
subsample | 0.8 |
Project is built with:
- Jeff Sackmann's ATP data repository
- Scikit-Learn
- Scikit-Optimize
- XGBoost for Python
- Flask
- Vue.js
- Pandas
- NumPy
- Joblib
- Flask-Cors
- Seaborn
- Matplotlib
Inspired by: Ahlem Jouidi's Notebook.
Tennis Predictor by Ziusz is licensed under CC BY-NC-SA 4.0.
Attribution of this project and Jeff Sackmann's repository is required.
Only noncommercial use of this project is permitted.
Adaptations must be shared under the same terms.