This competition is for those who entered the Data Science Melbourne 2015 Datathon.
You will already have the data for all games upto the semi-finals and finals. The task is to use this historical data to rank order the punters on their profit for the final 3 games of the tournament (which is why we didn't give you this data).
We provide the list of Account_IDs to make predictions for, along with some limited features for the final 3 games that you may make use of.
The objective is to determine if betting is just guessing, or if past performance can be indicative of future performance. We expect this to be very hard, and will be impressed if anyone can come up with an algorithm that is better than a random number generator!
We are treating this as a binary classification problem - did the account make a profit or not. The evaluation metric is the AUC.
An AUC of 0.5 is random guessing and 1 is a prefect solution.
- basic features
- random forest
- weighted profit formula
- new features (BL ratio, cancel ratio etc.)
- average profit formula
- new features (difference between L and B)
- xgboost
- new feature (invest amount)
- blended models
- X New benchmark (past history by game)
- X Log transformation
- X K-means (transactional features & customized imputation)
- X Feature selection
- X Multi-rounds
- X New Calculation
- X Event Counts / Bag of Event
- O Subset modeling
- O Invest weigeted calculation
- X Meta features
- xgboost (gbm, rf)
- h2o (gbm, rf, nb, glm, dl)
- spfia (svm, glm)
- tsne cluster
- k means cluster
- fm
- knn
- O New customers 0.43/-5
- New Feature
- Meta bagged modeling
- O Separate models (new/existing customers)
- Past value (cumsum)
- Factorization Machines (http://www.csie.ntu.edu.tw/~r01922136/libffm/)
- Regression + Classification
- python lasagne
- https://github.com/Gzsiceberg/kaggle-avito
- entropy based features
- Bad features: win_hist / DL metafeatures