forked from SEACrowd/seacrowd-datahub
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Remove
CONTRIBUTING.md
, update PR Message Template, and add bash to…
… initialize dataset (SEACrowd#468) * add bash to initialize dataset * delete CONTRIBUTING.md since it's duplicated with DATALOADER.md * update the docs slightly on suggesting new dataloader contributors to use template * fix few wordings * Add info on required vars '_LOCAL' * Add checklist on __init__.py * fix wording on 2nd checklist regarding 'my_dataset' that should've been a var instead of static val * fix wordings on first section of PR msg * add newline separator for better readability * add info on some to-dos
- Loading branch information
1 parent
2131edb
commit 6f0a2e0
Showing
5 changed files
with
42 additions
and
222 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,17 @@ | ||
Please name your PR after the issue it closes. You can use the following line: "Closes #ISSUE-NUMBER" where you replace the ISSUE-NUMBER with the one corresponding to your dataset. | ||
Please name your PR title and the first line of PR message after the issue it will close. You can use the following examples: | ||
|
||
**Title**: Closes #{ISSUE_NUMBER} | Add/Update Dataloader {DATALOADER_NAME} | ||
|
||
**First line PR Message**: Closes #{ISSUE_NUMBER} | ||
|
||
where you replace the {ISSUE_NUMBER} with the one corresponding to your dataset. | ||
|
||
### Checkbox | ||
- [ ] Confirm that this PR is linked to the dataset issue. | ||
- [ ] Create the dataloader script `seacrowd/sea_datasets/my_dataset/my_dataset.py` (please use only lowercase and underscore for dataset naming). | ||
- [ ] Provide values for the `_CITATION`, `_DATASETNAME`, `_DESCRIPTION`, `_HOMEPAGE`, `_LICENSE`, `_URLs`, `_SUPPORTED_TASKS`, `_SOURCE_VERSION`, and `_SEACROWD_VERSION` variables. | ||
- [ ] Create the dataloader script `seacrowd/sea_datasets/{my_dataset}/{my_dataset}.py` (please use only lowercase and underscore for dataset folder naming, as mentioned in dataset issue) and its `__init__.py` within `{my_dataset}` folder. | ||
- [ ] Provide values for the `_CITATION`, `_DATASETNAME`, `_DESCRIPTION`, `_HOMEPAGE`, `_LICENSE`, `_LOCAL`, `_URLs`, `_SUPPORTED_TASKS`, `_SOURCE_VERSION`, and `_SEACROWD_VERSION` variables. | ||
- [ ] Implement `_info()`, `_split_generators()` and `_generate_examples()` in dataloader script. | ||
- [ ] Make sure that the `BUILDER_CONFIGS` class attribute is a list with at least one `SEACrowdConfig` for the source schema and one for a seacrowd schema. | ||
- [ ] Confirm dataloader script works with `datasets.load_dataset` function. | ||
- [ ] Confirm that your dataloader script passes the test suite run with `python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py`. | ||
- [ ] Confirm that your dataloader script passes the test suite run with `python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py` or `python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py --subset_id {subset_name_without_source_or_seacrowd_suffix}`. | ||
- [ ] If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files. |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
#!/bin/bash | ||
|
||
# this simple bash will create a template and making necessary files and copying dataloader template script into dataloader folder dest | ||
|
||
if [[ "$1" == "" ]]; then | ||
echo "Error: Missing the dataset name to be created" | ||
echo "sh \${YOUR_SEACROWD_ROOT_PATH}/template/initiate_seacrowd_dataloader.sh <dataset name>" | ||
exit | ||
fi | ||
|
||
if [[ "$2" == "" ]]; then | ||
root_path=./ | ||
else | ||
root_path=$2 | ||
fi | ||
|
||
(cd $root_path/seacrowd/sea_datasets && mkdir $1 && cd $1 && touch __init__.py) | ||
cp $root_path/templates/template.py $root_path/seacrowd/sea_datasets/$1/$1.py | ||
|
||
echo "Initialization is done. Exiting..." |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters