Closes #425 | Add Dataloader MaXM #553

akhdanfadh · 2024-03-29T04:57:24Z

Closes #425

There is no subset specified in the homepage, but there are two files for one language: (1) regular QA, and (2) yes-no QA. I assumed each should be a subset (open to discuss). Thus, configs will look like this: maxm_regular_source, maxm_yesno_seacrowd_imqa, etc. When testing, pass maxm_<subset> to the --subset_id parameter.

Checkbox

Confirm that this PR is linked to the dataset issue.
Create the dataloader script seacrowd/sea_datasets/{my_dataset}/{my_dataset}.py (please use only lowercase and underscore for dataset folder naming, as mentioned in dataset issue) and its __init__.py within {my_dataset} folder.
Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _LOCAL, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _SEACROWD_VERSION variables.
Implement _info(), _split_generators() and _generate_examples() in dataloader script.
Make sure that the BUILDER_CONFIGS class attribute is a list with at least one SEACrowdConfig for the source schema and one for a seacrowd schema.
Confirm dataloader script works with datasets.load_dataset function.
Confirm that your dataloader script passes the test suite run with python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py or python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py --subset_id {subset_name_without_source_or_seacrowd_suffix}.
If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

akhdanfadh added 2 commits March 29, 2024 10:36

init commit

42b5b92

init commit

ae424af

akhdanfadh requested review from holylovenia, SamuelCahyawijaya, sabilmakbar, jamesjaya, yongzx, gentaiscool, ljvmiranda921, jensan-1, danjohnvelasco, MJonibek and tellarin as code owners March 29, 2024 04:57

akhdanfadh closed this Mar 29, 2024

akhdanfadh deleted the maxm branch March 29, 2024 05:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #425 | Add Dataloader MaXM #553

Closes #425 | Add Dataloader MaXM #553

akhdanfadh commented Mar 29, 2024

Closes #425 | Add Dataloader MaXM #553

Closes #425 | Add Dataloader MaXM #553

Conversation

akhdanfadh commented Mar 29, 2024

Checkbox