Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for Tatoeba #5

Closed
SamuelCahyawijaya opened this issue Oct 29, 2023 · 1 comment · Fixed by #22
Closed

Create dataset loader for Tatoeba #5

SamuelCahyawijaya opened this issue Oct 29, 2023 · 1 comment · Fixed by #22
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@SamuelCahyawijaya
Copy link
Collaborator

SamuelCahyawijaya commented Oct 29, 2023

Dataloader name: tatoeba/tatoeba.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?tatoeba

Dataset tatoeba
Description Tatoeba is a collection of sentences and translations collected by Tatoeba.org.
Subsets tatoeba.ind, tatoeba.jav, tatoeba.tgl, tatoeba.tha, tatoeba.vie
Languages ind, vie, tgl, jav, tha
Tasks Machine Translation
License Other (other)
Homepage https://tatoeba.org/
HF URL https://huggingface.co/datasets/xtreme/
Paper URL https://tatoeba.org/en/
@SamuelCahyawijaya SamuelCahyawijaya converted this from a draft issue Oct 29, 2023
@SamuelCahyawijaya SamuelCahyawijaya added good first issue Good for newcomers help wanted Extra attention is needed labels Oct 29, 2023
@ljvmiranda921
Copy link
Collaborator

#self-assign

ljvmiranda921 added a commit to ljvmiranda921/seacrowd-datahub that referenced this issue Nov 5, 2023
holylovenia added a commit that referenced this issue Nov 19, 2023
Closes #5 | Add tatoeba dataset loader
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants