Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for IndoCamRest #53

Closed
SamuelCahyawijaya opened this issue Nov 14, 2023 · 6 comments · Fixed by #257
Closed

Create dataset loader for IndoCamRest #53

SamuelCahyawijaya opened this issue Nov 14, 2023 · 6 comments · Fixed by #257
Assignees
Labels
pr-ready A PR that closes this issue is Ready to be reviewed

Comments

@SamuelCahyawijaya
Copy link
Collaborator

SamuelCahyawijaya commented Nov 14, 2023

Dataloader name: indocamrest/indocamrest.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?indocamrest

Dataset indocamrest
Description IndoCamRest is a synthetic task-oriented dialogue system dataset that translated from Cambridge Restaurant 676 (CamRest) dataset (Wen et al., 2016) into the new Indonesian parallel dataset using the translation pipeline method including the delexicalization, translation, and delexicalization. The dataset consists of 676 dialogues in the restaurant reservation domain, with a user and an agent talking to each other to search the restaurant near the user. It also consists of slots and dialogue acts from the user and the agent.
Subsets -
Languages ind
Tasks Dialogue System
License Creative Commons Attribution Share Alike 4.0 (cc-by-sa-4.0)
Homepage https://github.com/dehanalkautsar/IndoToD/tree/main/IndoCamRest
HF URL -
Paper URL https://arxiv.org/pdf/2311.00958.pdf
@SamuelCahyawijaya SamuelCahyawijaya converted this from a draft issue Nov 14, 2023
@dehanalkautsar
Copy link
Collaborator

#self-assign

@sabilmakbar
Copy link
Collaborator

Hi @dehanalkautsar, may I know the current status of this dataloader creation? Feel free to discuss here if you have any difficulties. Thanks!

@dehanalkautsar
Copy link
Collaborator

Hi! Sorry, but I still need to do the other things, so the current status of these dataloaders is still in progress. I will finish them once I get more free time (approximately my dataloaders finished in 2-3 weeks)

@dehanalkautsar
Copy link
Collaborator

Hi @sabilmakbar, I have a question. As you are aware, this dataset is designed for a dialogue system task. There is also a constant defined for the dialogue system, as mentioned in https://github.com/SEACrowd/seacrowd-datahub/blob/master/seacrowd/utils/constants.py#L74. However, when checking https://github.com/SEACrowd/seacrowd-datahub/tree/master/seacrowd/utils/schemas, I couldn't find any schema related to the dialogue system. I've also searched for another dialogue system dataloader, but there is no implementation for the SEACrowd schema (e.g. https://github.com/SEACrowd/seacrowd-datahub/blob/master/seacrowd/sea_datasets/cod/cod.py). What do you think I should do about this issue?

@holylovenia
Copy link
Contributor

@dehanalkautsar Thanks for the question. Yes, previously we haven't had a specific schema and task to cater to dialogue systems. I opened an issue and discussion regarding this: #172.

@dehanalkautsar
Copy link
Collaborator

Okay @holylovenia, I’ll be waiting for the update

@sabilmakbar sabilmakbar added the in-progress Assignee has given confirmation on progress and ETA label Dec 17, 2023
@sabilmakbar sabilmakbar added pr-ready A PR that closes this issue is Ready to be reviewed and removed in-progress Assignee has given confirmation on progress and ETA labels Dec 31, 2023
jamesjaya pushed a commit that referenced this issue Jan 18, 2024
* feat: add dataloader indocamrest for source

* refactor: indocamrest by pre-commit

* remove __name__:__main__ on indocamrest

* fix the license constant in IndoCamrest
raileymontalan pushed a commit to raileymontalan/seacrowd-datahub that referenced this issue Feb 28, 2024
* feat: add dataloader indocamrest for source

* refactor: indocamrest by pre-commit

* remove __name__:__main__ on indocamrest

* fix the license constant in IndoCamrest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-ready A PR that closes this issue is Ready to be reviewed
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants