Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset links #3

Open
JunwenBai opened this issue Dec 13, 2018 · 1 comment
Open

Dataset links #3

JunwenBai opened this issue Dec 13, 2018 · 1 comment
Labels

Comments

@JunwenBai
Copy link

Hi,
Do you have the links to the datasets you use? I am new to these datasets, but the paper is very interesting. I want to reproduce the results. However, it is not easy to find them and how you change the downloaded dataset to the formatted dataset. Though the dataset structure is described in README, it is still not clear to me that how a formatted dataset should be. Do you mind elaborating on that a little bit, like giving us a sample dataset in the repo?
Thanks!

@f90 f90 added the question label Jan 20, 2019
@f90
Copy link
Owner

f90 commented Jan 20, 2019

Hey,
the datasets are indeed a problem... I will explain:
DSD100 is relatively easily accessed here:
https://sigsep.github.io/datasets/dsd100.html

MedleyDB can be downloaded after registration here:
https://medleydb.weebly.com/downloads.html

CCMixter download is (a bit hidden) on the website here:
https://members.loria.fr/ALiutkus/kam/

iKala recently got retracted - you cannot sign a license for usage anymore and the download is not possible anymore... This is very unfortunate, since the license I have doesn't allow me to redistribute the dataset to you for replication. You should get similar results though if you simply exclude it from the experiment (more specifically, from the unsupervised part of the data that is used in the semi-supervised setting as well as from the validation and test data).

As for more explanation, see the Training.py file on how the datasets are loaded and handled internally, so you can get an idea how to get things started with different configurations, especially since you probably need to remove iKala from being used in the code there. This part should be easy though: Simply remove the iKala object by deleting the line

 ikala = Datasets.getIKala("iKala.xml")

and do not use it while iterating through the unsupervised datasets in line

for ds in [mdb, ccm, ikala]:

Hope that helps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants