Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for MyWSL2023 #278

Closed
SamuelCahyawijaya opened this issue Jan 1, 2024 · 8 comments · Fixed by #472
Closed

Create dataset loader for MyWSL2023 #278

SamuelCahyawijaya opened this issue Jan 1, 2024 · 8 comments · Fixed by #472
Assignees
Labels
pr-ready A PR that closes this issue is Ready to be reviewed

Comments

@SamuelCahyawijaya
Copy link
Collaborator

Dataloader name: mywsl2023/mywsl2023.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?mywsl2023

Dataset mywsl2023
Description This dataset contains pictures of hand gestures corresponding to ten commonly-used Malaysian Sign Language (XML) words. Gestures are performed by five university students who belong to different ethnic groups and are proficient in XML. Each gesture class contains 350 instances.
Subsets -
Languages xml
Tasks Language Modeling
License Creative Commons Attribution 4.0 (cc-by-4.0)
Homepage https://data.mendeley.com/datasets/zvk55p7ktd/1
HF URL -
Paper URL https://www.sciencedirect.com/science/article/pii/S2352340923004560
@SamuelCahyawijaya SamuelCahyawijaya converted this from a draft issue Jan 1, 2024
@Alex-HaochenLi
Copy link
Contributor

#self-assign

@Alex-HaochenLi
Copy link
Contributor

Hello @holylovenia @SamuelCahyawijaya @sabilmakbar,

After checking the data, I find that this dataset is used for sign languages, which means that data are all pictures.
I don't think it could support language modeling task. Please have a check.

@sabilmakbar sabilmakbar added pr-ready A PR that closes this issue is Ready to be reviewed in-progress Assignee has given confirmation on progress and ETA and removed pr-ready A PR that closes this issue is Ready to be reviewed labels Jan 7, 2024
@sabilmakbar
Copy link
Collaborator

Apologies for laterep, we're on it rn as we found some other datasets having similar issues w/ this one.

@holylovenia
Copy link
Contributor

@Alex-HaochenLi Sorry for the mistake. I've fixed the datasheet to have Sign Language Recognition instead of Language Modeling. Probably we have to add a new task in the constants.py to cater to this dataloader, though.

What do you think?

cc: @sabilmakbar @SamuelCahyawijaya

@holylovenia
Copy link
Contributor

Hi @Alex-HaochenLi, @sabilmakbar has kindly added the SIGN_LANGUAGE_RECOGNITION task so you can proceed with the dataloader implementation.

@Alex-HaochenLi Alex-HaochenLi removed their assignment Jan 20, 2024
@holylovenia holylovenia added help wanted Extra attention is needed and removed in-progress Assignee has given confirmation on progress and ETA help wanted Extra attention is needed labels Jan 25, 2024
@Enliven26
Copy link
Contributor

#self-assign

Copy link

Hi @, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

@Enliven26
Copy link
Contributor

yes

@holylovenia holylovenia added pr-ready A PR that closes this issue is Ready to be reviewed and removed staled-issue labels Mar 18, 2024
sabilmakbar pushed a commit that referenced this issue May 31, 2024
* feat: mywsl2023 dataloader

* fix: set label names to empty

* refactor: remove unused print

* fix: citation

* fix: use datasets.split

* fix: use enumerate for idx

* fix: remove abstract in citation

* fix: add label to seacrowd schema

* refactor: unused index and split string
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-ready A PR that closes this issue is Ready to be reviewed
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants