-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config helper (WIP) and missed constants in existing dataloaders #605
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init review, for the constant missed I'll cross-check it with my checker & fix (prob will add the check script here too if needed)
from .utils.constants import Tasks, SCHEMA_TO_TASKS | ||
import pandas as pd | ||
|
||
_LARGE_CONFIG_NAMES = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we going to update it later? or we just write it as-is from NusaCrowd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update later! Might also deprecate this function depending on its use case. Do you have any suggestion regarding this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob we can change the value according to the SEACrowd Dataset Size Info. But since NusaCrowd v1.0 is a subset of SEACrowd, we can keep it as-is for now and update it later
|
||
] | ||
|
||
BENCHMARK_DICT = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will we add SEACrowd-based Benchmark to this later? or could it be done on this review since the dataset picking has been done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sabilmakbar Yes yes. I'll add it on the next PR since we're still reviewing some of the speech dataloaders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and for benchmark, we can use the datasets used for SEACrowd Experiment, hence the value should be overriden from NC
…into holy/pypi
btw @holylovenia is the config still on WIP? |
…into holy/pypi
…into holy/pypi
_LANGUAGES
and_LOCAL
) found during the config helper creation.