-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #571 | Add/Update Dataloader UP2.0 #660
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fhudi : LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran the data loader and worked! I guess this one doesn't have a SEACrowd schema, no? Some commented blocks that can be deleted (optionally). But good to merge as is 👍
seacrowd/sea_datasets/up2/up2.py
Outdated
# *[SEACrowdConfig( | ||
# name=f"{_DATASETNAME}{'_' if _LANG else ''}{_LANG}_seacrowd_[seacrowd_schema_name]", | ||
# version=datasets.Version(_SEACROWD_VERSION), | ||
# description=f"{_DATASETNAME} SEACrowd schema", | ||
# schema="seacrowd_[seacrowd_schema_name]", | ||
# subset_id=f"{_DATASETNAME}{'_' if _LANG else ''}{_LANG}", | ||
# ) for _LANG in ['', *_LANGUAGES]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this one doesn't have a SEACrowd schema. I guess we can delete this commented block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. 👍
Changed
to
But please confirm if these result are correct or not |
Hi @SamuelCahyawijaya @ljvmiranda921 @khelli07 and @fhudi, I won't merge this dataloader as-is due to the error reported by @khelli07. Let's wait until @fhudi addresses @ljvmiranda921 and @khelli07's feedback. PS: Sorry I messed up by assigning this dataloader to 3 people. 🫠 I will adjust the reviewing scores later. |
@khelli07 As mentioned in the reviewing checklist point 3, you need to execute test script using the one that is appropriate under the test folder. You can rever to the example of executable test script on how to use it. P.S.: |
Co-authored-by: Lj Miranda <[email protected]>
Co-authored-by: Lj Miranda <[email protected]>
Co-authored-by: Lj Miranda <[email protected]>
@khelli07 |
Hi, the change is to avoid this error: It appears when I ran After I include that change, I successfully ran I'm not sure if this is particular to Windows only or not. After the changes, the code might need to be tested on other OS too. |
Could you try with the following combination and see whether it is solved with any of the followings:
Are you using WSL? |
Nope, I use Windows without WSL. Let me try it. |
Ok just finished testing, both of them works. So you might be able to choose either one. |
Hi @fhudi @khelli07, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) in 31 hours, so it'd be great if we could wrap up the reviewing and merge this PR before then. Has everything been addressed now? Can we merge this PR? |
I think this is just waiting for @khelli07 's review. But from the last comment it seems that it has been resolved 🤔 . I was able to run this so I think it's mostly a platform-specific issue. Dataloader works though so I think it's safe to merge! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The load_ud_data
in the common_parser.py
hasn't been changed yet. Without that change, I don't think I can run successfully in my Windows.
And yes, as @ljvmiranda921 said, I agree that this is a platform-specific issue. If that can be tolerated, then can merge.
@khelli07 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@khelli07 @ljvmiranda921 |
Sorry for jumping it late. I think this is a platform-based issue where, for non-Linux/MacOS, the encoding doesn't default to |
Since both reviewers have given the approver (and the dataloader assignee best not to merge it themselves even though it has been approved), I'll proceed with the merging. |
Closes #571
Checkbox
seacrowd/sea_datasets/{my_dataset}/{my_dataset}.py
(please use only lowercase and underscore for dataset folder naming, as mentioned in dataset issue) and its__init__.py
within{my_dataset}
folder._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_LOCAL
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_SEACROWD_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneSEACrowdConfig
for the source schema and one for a seacrowd schema.datasets.load_dataset
function.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py
orpython -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py --subset_id {subset_name_without_source_or_seacrowd_suffix}
.Currently implemented as source-only, as there seems to be no matching task.
(Based on the information provided in the card)
Issue was initially raised in the issue thread #571.