Skip to content

Latest commit

 

History

History
30 lines (21 loc) · 1.75 KB

README.md

File metadata and controls

30 lines (21 loc) · 1.75 KB

Data Fusion 2022 Contest

8th place solution for Data Fusion 2022 Contest.

Rank Public Private
Matching 6 8

Used technology

Python Jupyter Numpy Pandas scikit_learn

Problem solving

  1. Before analyzing transactional data, we need to create useful features based on the all available data. This will allow you to get more information in the context of various measurements in the future (such as time of day, days of the week, etc.), as well as use the obtained features to train machine learning models.

  2. Training:

    • CatBoostRanker with YetiRank loss with 9000 iterations,
    • Ensembling of 2 catboost models with different parameters.

Data

  1. General data for all tasks in a tabular .csv format: transactions.zip, clicstream.zip and the target variable train_matching.csv
  2. Common accompanying data for all tasks in tabular .csv format: mcc_codes.csv, click_categories.csv and currency_rk.csv
  3. Baselines and examples of solutions for a container Matching problem: random solution sample_submission.zip and baseline_catboost.zip with an example of a solution based on the catboost library using GPU