Skip to content

Commit

Permalink
Update template.py instruction for dataloader class name (SEACrowd#334)
Browse files Browse the repository at this point in the history
* Add documentation for dataloader class name

* Update template.py

* Update REVIEWING.md

This modified the content of adding "Dataset" suffix into optional, and giving a reference to templates/templates.py for example

* Update REVIEWING.md

fix file reference name

---------

Co-authored-by: Salsabil Maulana Akbar <[email protected]>
  • Loading branch information
2 people authored and Railey Montalan committed Feb 28, 2024
1 parent 911a582 commit db035c6
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
4 changes: 2 additions & 2 deletions REVIEWING.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,8 @@ The objective of datasheet review is to ensure that all dataloaders in SEACrowd
b. Execute `datasets.load_dataset` check based on config list (a)
c. Check on the dataset schema & few first examples for plausibility.
5. Follows some general rules/conventions:
1. `PascalCase` for dataloader class name (and “Dataset” is contained in the suffix of the class name).
2. Lowercase word characters (regex identifier: `\w`) for schema column names, including the `source` schema if the original dataset doesn’t follow it.
1. Use `PascalCase` for the dataloader class name (optional: “Dataset” can be appended to the Dataloader class name, see `templates/template.py` for example).
2. Use lowercase word characters (regex identifier: `\w`) for schema column names, including the `source` schema if the original dataset doesn’t follow it.
6. The code aligns with the `black` formatter:
use this `make check_file=seacrowd/sea_datasets/{dataloader}/{dataloader}.py`
7. Follows Dataloader Config Rule
Expand Down
3 changes: 2 additions & 1 deletion templates/template.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,8 @@
_SEACROWD_VERSION = "1.0.0"


# TODO: Name the dataset class to match the script name using CamelCase instead of snake_case
# TODO: Name the dataset class to match the script name using PascalCase instead of snake_case.
# optional: class name can append "Dataset" as suffix to provide better clarity (e.g. OSCAR 2201 --> Oscar2201Dataset/Oscar2201)
class NewDataset(datasets.GeneratorBasedBuilder):
"""TODO: Short description of my dataset."""

Expand Down

0 comments on commit db035c6

Please sign in to comment.