Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Schema: End-to-end Task-oriented Dialogue System (WoZ) #237

Merged
merged 14 commits into from
Jan 3, 2024

Conversation

dehanalkautsar
Copy link
Collaborator

Closes #172 .

Note

  • The implemented schema is based on https://huggingface.co/datasets/woz_dialogue
  • I apologize for inadvertently including push results from my other branch (PhoATIS) in the pull request submitted here. But I believe that the branch itself is safe from the errors and passed the tests

Copy link
Collaborator

@SamuelCahyawijaya SamuelCahyawijaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dehanalkautsar, thanks for the schema!! It's really helpful since we didn't have any dialogue schema previously. @holylovenia and I have some suggestions regarding this schema:

  1. Change the task and schema name to E2E_TASK_ORIENTED_DIALOGUE and TOD (tod.py).
  2. Remove the unrelated changes, e.g., PhoATIS, from the PR.
  3. Remove the asr field from the schema.
  4. Change the system_transcript field to system_utterance.
  5. Change the transcript field to user_utterance.

Please let us know if you have any questions.

@dehanalkautsar
Copy link
Collaborator Author

Hi @dehanalkautsar, thanks for the schema!! It's really helpful since we didn't have any dialogue schema previously. @holylovenia and I have some suggestions regarding this schema:

  1. Change the task and schema name to E2E_TASK_ORIENTED_DIALOGUE and TOD (tod.py).
  2. Remove the unrelated changes, e.g., PhoATIS, from the PR.
  3. Remove the asr field from the schema.
  4. Change the system_transcript field to system_utterance.
  5. Change the transcript field to user_utterance.

Please let us know if you have any questions.

Hi @SamuelCahyawijaya, I agree with your suggestions and have implemented the schema based on your suggestions. Let me know if you have any points that need to be discussed.

@sabilmakbar
Copy link
Collaborator

Hi @dehanalkautsar, thanks for implementing this; really helpful for SEACrowd ToD Datasets going forward.

A couple of comments on docstring consistency:

  1. Mind updating the example on the tod.py example so it's reflected in the newest schema you update based on @SamuelCahyawijaya's suggestion?
  2. I'm thinking of changing the wording from imitating to adapting on the tod.py docstring, hinting we're adapting to WoZ schema rather than copying it as is, wdyt @dehanalkautsar?

@dehanalkautsar
Copy link
Collaborator Author

Hi @sabilmakbar , oh yeah, sorry I forgot about that. I have fixed the docstring and it should be correct now. Thanks for your input!

@sabilmakbar
Copy link
Collaborator

Hi @holylovenia @SamuelCahyawijaya, do you have any additional comments on this? So far, it looks good to me.

btw @dehanalkautsar, have you tried to temporarily add the codebase and implement it to the dataloader of yours?

@dehanalkautsar
Copy link
Collaborator Author

Yes I have @sabilmakbar. Actually I've created PRs for those dataloaders using this schema (#257, #258, #261)

Copy link
Contributor

@holylovenia holylovenia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, @dehanalkautsar. Looks good to me!! 🍾 I'll ping @SamuelCahyawijaya to get his opinion ASAP. If he gives the okay, then we can merge it.

Copy link
Collaborator

@SamuelCahyawijaya SamuelCahyawijaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @dehanalkautsar , LGTM!

@sabilmakbar
Copy link
Collaborator

Let's merge this, then!

@SamuelCahyawijaya SamuelCahyawijaya merged commit b7d72a5 into SEACrowd:master Jan 3, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create new task and schema for dialogue system datasets
4 participants