Add config helper (WIP) and missed constants in existing dataloaders #605

holylovenia · 2024-04-04T14:00:40Z

Derive SEACrowdConfigHelper from NusantaraConfigHelper. Basic function works but haven't modified SEACrowdMetadataHelper, constants, etc.
Add some missed constants (e.g., _LANGUAGES and _LOCAL) found during the config helper creation.

SamuelCahyawijaya

LGTM!

sabilmakbar

init review, for the constant missed I'll cross-check it with my checker & fix (prob will add the check script here too if needed)

sabilmakbar · 2024-04-08T10:27:07Z

seacrowd/config_helper.py

+from .utils.constants import Tasks, SCHEMA_TO_TASKS
+import pandas as pd
+
+_LARGE_CONFIG_NAMES = [


are we going to update it later? or we just write it as-is from NusaCrowd?

Will update later! Might also deprecate this function depending on its use case. Do you have any suggestion regarding this?

prob we can change the value according to the SEACrowd Dataset Size Info. But since NusaCrowd v1.0 is a subset of SEACrowd, we can keep it as-is for now and update it later

sabilmakbar · 2024-04-08T10:27:47Z

seacrowd/config_helper.py

+
+]
+
+BENCHMARK_DICT = {


will we add SEACrowd-based Benchmark to this later? or could it be done on this review since the dataset picking has been done

@sabilmakbar Yes yes. I'll add it on the next PR since we're still reviewing some of the speech dataloaders.

and for benchmark, we can use the datasets used for SEACrowd Experiment, hence the value should be overriden from NC

…into holy/pypi

sabilmakbar · 2024-05-01T17:13:13Z

btw @holylovenia is the config still on WIP?

…into holy/pypi

Add config helper (WIP) and missed constants in existing dataloaders

b23bec1

holylovenia requested review from SamuelCahyawijaya, sabilmakbar, jamesjaya, yongzx, gentaiscool, ljvmiranda921, jensan-1, danjohnvelasco, MJonibek and tellarin as code owners April 4, 2024 14:00

holylovenia added bug Something isn't working enhancement New feature or request labels Apr 4, 2024

holylovenia assigned SamuelCahyawijaya and sabilmakbar Apr 4, 2024

holylovenia removed request for tellarin, gentaiscool, jamesjaya, ljvmiranda921, yongzx, MJonibek, danjohnvelasco and jensan-1 April 4, 2024 14:01

Clarify method names

a3911c2

SamuelCahyawijaya approved these changes Apr 8, 2024

View reviewed changes

sabilmakbar reviewed Apr 8, 2024

View reviewed changes

holylovenia added 4 commits April 12, 2024 09:43

Fix some error-triggering parts

5858e56

Modify package setup

ad8bb9b

Fix bug

0ecc98d

Add the latest version

3f8c571

holylovenia added 8 commits April 30, 2024 03:47

Merge branch 'master' of https://github.com/SEACrowd/seacrowd-datahub …

648bfcc

…into holy/pypi

Remove task for source-only dataset

a5541fa

Change train to test split

6468361

Include all languages in the data

a667cc3

Change train to test

1e7026d

Change train to test

1467309

Remove numbering from options and answer

700c177

Add the newest version

7d0ab71

sabilmakbar and others added 10 commits May 2, 2024 12:38

remove main entry of module for dataloaders (#662)

f8e1212

Fix tgl --> fil for AYA dataset

9bd7bf6

Change train -> test and add eng as the MT lang pair

c1ec65d

Change incorrect name

ef89194

Change incorrect name

970afbc

Change subset id to '*_{lang}_eng_*' or '*_eng_{lang}_*'

636ebfa

Merge branch 'master' of https://github.com/SEACrowd/seacrowd-datahub …

4902542

…into holy/pypi

Fix paracotta_id's download issue

b3ad89c

Merge branch 'master' of https://github.com/SEACrowd/seacrowd-datahub …

83901a6

…into holy/pypi

Normalize subset names and enable eng_{lang} pairings

09051b2

github-actions bot added the need-fu-pr label May 28, 2024

holylovenia added 6 commits June 19, 2024 05:49

Merge branch 'master' of https://github.com/SEACrowd/seacrowd-datahub …

dba6e62

…into holy/pypi

Fix load_* methods

0be9162

Fix available_* methods

0e590b6

Change _SEACROWD_VERSION to reflect the date of last update

2e7509b

Add SEACrowd benchmark config list

4ac5ffa

Update seacrowd to 0.1.0

9db6d22

holylovenia merged commit 9db6d22 into master Jun 19, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add config helper (WIP) and missed constants in existing dataloaders #605

Add config helper (WIP) and missed constants in existing dataloaders #605

holylovenia commented Apr 4, 2024

SamuelCahyawijaya left a comment

sabilmakbar left a comment

sabilmakbar Apr 8, 2024

holylovenia Apr 22, 2024

sabilmakbar May 1, 2024

sabilmakbar Apr 8, 2024

holylovenia Apr 22, 2024

sabilmakbar May 1, 2024

sabilmakbar commented May 1, 2024


		]

		BENCHMARK_DICT = {

Add config helper (WIP) and missed constants in existing dataloaders #605

Add config helper (WIP) and missed constants in existing dataloaders #605

Conversation

holylovenia commented Apr 4, 2024

SamuelCahyawijaya left a comment

Choose a reason for hiding this comment

sabilmakbar left a comment

Choose a reason for hiding this comment

sabilmakbar Apr 8, 2024

Choose a reason for hiding this comment

holylovenia Apr 22, 2024

Choose a reason for hiding this comment

sabilmakbar May 1, 2024

Choose a reason for hiding this comment

sabilmakbar Apr 8, 2024

Choose a reason for hiding this comment

holylovenia Apr 22, 2024

Choose a reason for hiding this comment

sabilmakbar May 1, 2024

Choose a reason for hiding this comment

sabilmakbar commented May 1, 2024