Skip to content

This project contains various scripts that can assist in the process of preparing datasets.

License

Notifications You must be signed in to change notification settings

Kaszanas/SC2DatasetPreparator

Repository files navigation

DOI

DatasetPreparator

This project contains various scripts that can assist in the process of preparing datasets. To have a broad overview of the tools please refer to the Detailed Tools Description.

Tools in this repository were used to create the SC2ReSet: StarCraft II Esport Replaypack Set, and finally SC2EGSet: StarCraft II Esport Game State Dataset, citation information Cite Us!.

Installation

Note

To run this project there are some prerequisites that you need to have installed on your system:

  • Docker
  • make

Our prefered way of distributing the toolset is through DockerHub. We Use the Docker Image to provide a fully reproducible environment for our scripts.

To pull the image from DockerHub, run the following command:

docker pull kaszanas/datasetpreparator:latest

If you wish to clone the repository and build the Docker image yourself, run the following command:

make docker_build

After building the image, please refer to the Command Line Arguments Usage section for the usage of the scripts and for a full description for each of the scripts refer to Detailed Tools Description.

Command Line Arguments Usage

When using Docker, you will have to pass the arguments through the docker run command and mount the input/output directory. Below is an example of how to run the directory_flattener script using Docker. For ease of use we have prepared example directory structure in the processing directory. The command below uses that to issue a command to flatten the directory structure:

docker run \
  -v "./processing:/app/processing" \
  datasetpreparator:latest \
  python3 directory_flattener.py \
  --input_path /app/processing/directory_flattener/input \
  --output_path /app/processing/directory_flattener/output

SC2EGSet Dataset Reproduction Steps

Note

Instructions below are for reproducing the result of the SC2EGSet dataset. If you wish to use the tools in this repository separately for your own dataset, please refer to the Detailed Tools Description.

Using Docker

We provide a release image containing all of the scripts. To see the usage of these scripts please refer to their respective README.md files as described in Detailed Tools Description.

The following steps were used to prepare the SC2EGSet dataset:

  1. Build the docker image for the DatasetPreparator using the provided makefile command: make docker_build. This will load all of the dependencies such as the SC2InfoExtractorGo.
  2. Place the input replaypacks into ./processing/directory_flattener/input directory.
  3. Run the command make sc2reset_sc2egset to process the replaypacks and create the dataset. The output will be placed in ./processing/sc2egset_replaypack_processor/output directory.

Detailed Tools Description

Each of the scripts has its usage described in their respective README.md files, you can find the detailed description of the available tools below.

CLI Usage; Generic scripts

  1. Directory Packager (dir_packager): README
  2. Directory Flattener (directory_flattener): README
  3. File Renamer (file_renamer): README
  4. JSON Merger (json_merger): README
  5. Processed Mapping Copier (processed_mapping_copier): README

CLI Usage; StarCraft 2 Specific Scripts

  1. SC2 Map Downloader (sc2_map_downloader): README
  2. SC2EGSet Replaypack Processor (sc2egset_replaypack_processor): README
  3. SC2ReSet Replaypack Downloader (sc2reset_replaypack_downloader): README

Contributing and Reporting Issues

If you want to report a bug, request a feature, or open any other issue, please do so in the issue tracker.

Please see CONTRIBUTING.md for detailed development instructions and contribution guidelines.

Cite Us!

This Repository

@software{Białecki_2022_6366039,
  author    = {Białecki, Andrzej and
               Białecki, Piotr and
               Krupiński, Leszek},
  title     = {{Kaszanas/SC2DatasetPreparator: 1.2.0
               SC2DatasetPreparator Release}},
  month     = {jun},
  year      = {2022},
  publisher = {Zenodo},
  version   = {1.2.0},
  doi       = {10.5281/zenodo.5296664},
  url       = {https://doi.org/10.5281/zenodo.5296664}
}

@article{Bialecki2023_SC2EGSet,
  author   = {Bia{\l}ecki, Andrzej
              and Jakubowska, Natalia
              and Dobrowolski, Pawe{\l}
              and Bia{\l}ecki, Piotr
              and Krupi{\'{n}}ski, Leszek
              and Szczap, Andrzej
              and Bia{\l}ecki, Robert
              and Gajewski, Jan},
  title    = {SC2EGSet: StarCraft II Esport Replay and Game-state Dataset},
  journal  = {Scientific Data},
  year     = {2023},
  month    = {Sep},
  day      = {08},
  volume   = {10},
  number   = {1},
  pages    = {600},
  issn     = {2052-4463},
  doi      = {10.1038/s41597-023-02510-7},
  url      = {https://doi.org/10.1038/s41597-023-02510-7}
}

About

This project contains various scripts that can assist in the process of preparing datasets.

Resources

License

Stars

Watchers

Forks

Packages

No packages published