Streamlit app to create subsets of The MTG Jamendo dataset
-
Clone this repo:
git clone https://github.com/seunggookim/manymusic-mtg-jamendo.git
-
Get mtg-jamendo-dataset as a submodule:
git submodule init && git submodule update
-
Create a virtual env:
python3 -m venv venv
-
Activate a virtual env:
- Mac/Linux:
source venv/bin/activate
- Windows:
source venv/Scripts/activate
- Mac/Linux:
-
Install the Python dependencies:
pip install -r requirements.txt
Note
An additional requirements file requirements-dev.txt
should be used to run other functionalities apart from the Streamlit annotation app.
-
Copy the data into
data/
. The required files are:mtg-jamendo-predictions-algos.pk
,mtg-jamendo-predictions-av.pk
,mtg-jamendo-predictions.tsv
, and the timewisepredictions/
. -
Start the app:
streamlit run manymusic-viz.py
-
The Streamlit app will generate JSON file
data/clean_tids.json
with the candidate MTG Jamendo ids for the ManyMusic dataset. The resulting ids are randomly sampled from a pool of valid ids created with several filtering stages where the threshold can be updated by the user. -
Run
python clustering.py
to generate a dictionary of tids sampled by applying clustering to the tracks belonging to the different genres. -
Run
python postprocess.py
to generate a tsv combining several output jsons. Optionally, the resulting dataset can be split into equally sized chunks.
- Go to the cloned directory and activate the virtual environment (VENV):
- Mac/Linux:
source venv/bin/activate
- Windows:
source venv/Scripts/activate
- Mac/Linux:
- Run the script:
streamlit run manymusic-annotator.py