Tennis Predictor

Machine learning models with built-in platform for reading and analyzing tennis results.
The project is part of my BSc Thesis, which compares popular ML algorithms.
Many thanks to Jeff Sackmann's repository, which collects all ATP match data in a neat CSV format.

Requirements

Python:
- scikit-learn
- scikit-optimize
- pandas
- numpy
- xgboost
- joblib
- flask
- flask-cors
Node.js and npm

Installation

Clone this repository: git clone https://github.com/Ziusz/tennis-predictor.git
Install python dependencies via pip: pip install -r requirements.txt
Initialize Jeff Sackmann's repository: git submodule update --init --recursive
Initialize the data preprocessing script: python scripts/preprocess_data.py
Initialize the model learning script (It might take several hours ¯_(ツ)_/¯): python scripts/train_models.py
If you want to use the web interface, install all frontend dependencies via npm:

cd frontend
npm install

Usage

Command-line Interface

CLI has two commands:

evaluate, to get stats and info about trained model:

python scripts/cli.py evaluate <model_name>
# example:
python scripts/cli.py evaluate logistic_regression

predict, to get probability of player1 winning calculated by all models:

python scripts/cli.py predict <p1_hand> <p1_height> <p1_age> <p1_rank> <p1_rank_points> <p2_hand> <p2_height> <p2_age> <p2_rank> <p2_rank_points> <surface> <tourney_level> <tourney_round>
# example:
python scripts/cli.py predict L 180 25.5 14 3500 R 198 21 44 1250 Grass G R32

API

There is also an optional API that allows to send HTTP requests to models.
To turn on API you need to initialize api script: python api.py
There are 3 endpoints:

GET: /players, which returns JSON with all players info
GET: /evaluate/<model_name>, which returns JSON with metrics of chosen model

POST: /predict, which returns JSON with probability of player1 winning calculated by all models Example body structure of request:

 {
   "player1": {
     "hand": "R",
     "height": 194,
     "age": 25,
     "rank": 11,
     "rank_points": 4000
   },
   "player2": {
     "hand": "L",
     "height": 178,
     "age": 19,
     "rank": 150,
     "rank_points": 35
   },
   "surface": "Hard",
   "tourney_level": "A",
   "round": "QF"
 }

API runs on port 7771 by default.

Web Interface

This is a frontend written in Vue.js, so it requires dependencies installed by npm. If you haven't installed them yet, go back to the last section of the installation description.
It requires an initialized API to work. To turn on frontend you need to initialize serve script:

cd frontend
npm run serve

API runs on port 8080 by default.
If you host the site locally, you can visit the URL http://localhost:8080.

Evaluation of models

Current metrics

	Accuracy (%)	Precision (%)	Recall (%)	F1-score (%)
Logistic regression	64.81%	64.90%	64.51%	64.70%
SVM	51.04%	50.55%	95.81%	66.18%
Random Forest	70.78%	70.97%	70.34%	70.65%
KNN	65.95%	66.41%	64.54%	65.46%
Gradient Boosting	67.31%	67.31%	67.33%	67.32%
XGBoost	66.87%	66.87%	66.87%	66.87%

Random Forest is definitely the best one at the moment.
SVM has a very high recall rate, suggesting a problem with excessive false classifications.

Current hyperparameters selected by Bayes Search

Logistic regression

Hyperparameter	Value
C	10
max_iter	1203
solver	saga

SVM

Hyperparameter	Value
C	0.12842116071378784
coef0	0.006382660838799905
gamma	auto
kernel	poly
max_iter	825

Random Forest

Hyperparameter	Value
criterion	entropy
max_depth	15
max_features	log2
min_samples_leaf	7
min_samples_split	4
n_estimators	1000

KNN

Hyperparameter	Value
algorithm	brute
leaf_size	20
n_neighbors	98
p	1
weights	uniform

Gradient Boosting

Hyperparameter	Value
learning_rate	0.03589451606571555
loss	log_loss
max_depth	4
max_features	log2
n_estimators	936
subsample	0.8033808791659816

XGBoost

Hyperparameter	Value
alpha	10
colsample_bytree	0.6758151134813088
eval_metric	rmsle
grow_policy	lossguide
lambda	0.4652332574590495
learning_rate	0.16017846170784672
max_depth	3
n_estimators	316
subsample	0.8

Acknowledgments

Project is built with:

Inspired by: Ahlem Jouidi's Notebook.

License

Tennis Predictor by Ziusz is licensed under CC BY-NC-SA 4.0.
Attribution of this project and Jeff Sackmann's repository is required.
Only noncommercial use of this project is permitted.
Adaptations must be shared under the same terms.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data/raw		data/raw
frontend		frontend
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
api.py		api.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tennis Predictor

Requirements

Installation

Usage

Command-line Interface

API

Web Interface

Evaluation of models

Current metrics

Current hyperparameters selected by Bayes Search

Logistic regression

SVM

Random Forest

KNN

Gradient Boosting

XGBoost

Acknowledgments

License

About

Releases

Packages

Languages

License

Ziusz/tennis-predictor

Folders and files

Latest commit

History

Repository files navigation

Tennis Predictor

Requirements

Installation

Usage

Command-line Interface

API

Web Interface

Evaluation of models

Current metrics

Current hyperparameters selected by Bayes Search

Logistic regression

SVM

Random Forest

KNN

Gradient Boosting

XGBoost

Acknowledgments

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages