diff --git a/REVIEWING.md b/REVIEWING.md index 2e89b786a..a06b3a396 100644 --- a/REVIEWING.md +++ b/REVIEWING.md @@ -66,8 +66,8 @@ The objective of datasheet review is to ensure that all dataloaders in SEACrowd b. Execute `datasets.load_dataset` check based on config list (a) c. Check on the dataset schema & few first examples for plausibility. 5. Follows some general rules/conventions: - 1. `PascalCase` for dataloader class name (and “Dataset” is contained in the suffix of the class name). - 2. Lowercase word characters (regex identifier: `\w`) for schema column names, including the `source` schema if the original dataset doesn’t follow it. + 1. Use `PascalCase` for the dataloader class name (optional: “Dataset” can be appended to the Dataloader class name, see `templates/template.py` for example). + 2. Use lowercase word characters (regex identifier: `\w`) for schema column names, including the `source` schema if the original dataset doesn’t follow it. 6. The code aligns with the `black` formatter: use this `make check_file=seacrowd/sea_datasets/{dataloader}/{dataloader}.py` 7. Follows Dataloader Config Rule diff --git a/templates/template.py b/templates/template.py index 59bafe0aa..c6ba4ad99 100644 --- a/templates/template.py +++ b/templates/template.py @@ -101,7 +101,8 @@ _SEACROWD_VERSION = "1.0.0" -# TODO: Name the dataset class to match the script name using CamelCase instead of snake_case +# TODO: Name the dataset class to match the script name using PascalCase instead of snake_case. +# optional: class name can append "Dataset" as suffix to provide better clarity (e.g. OSCAR 2201 --> Oscar2201Dataset/Oscar2201) class NewDataset(datasets.GeneratorBasedBuilder): """TODO: Short description of my dataset."""