Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for Cross-Lingual Data Augmentation For Thai QA #704

Open
SamuelCahyawijaya opened this issue Jul 30, 2024 · 0 comments

Comments

@SamuelCahyawijaya
Copy link
Collaborator

Dataloader name: cross_lingual_augmented_thai_qa/cross_lingual_augmented_thai_qa.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?cross_lingual_augmented_thai_qa

Dataset cross_lingual_augmented_thai_qa
Description This paper presents an innovative data augmentation framework with data quality control designed to enhance the robustness of Question Answering (QA) models in low-resource languages, particularly Thai. Recognizing the challenges posed by the scarcity and quality of training data, we leverage data augmentation techniques in both monolingual and cross-lingual settings. Our approach augments and enriches the original dataset, thereby increasing its linguistic diversity and robustness. We evaluate the robustness of our framework on Machine Reading Comprehension, and the experimental results illustrate the potential of data augmentation to effectively increase training data and improve model generalization in low-resource language settings, offering a promising direction for the data augmentation manner.
Subsets th_fasttext_aug, th_llm_gec_aug, th_llm_paraphrase_aug, th_ltw2v_aug, th_qcpg_0.2_aug, th_qcpg_0.2_llm_gec_aug, th_qcpg_0.5_aug, th_qcpg_0.5_llm_gec_aug, th_qcpg_0.8_aug, th_qcpg_0.8_llm_gec_aug, th_thai2fit_aug, th_thai2trans_aug, th_wordnet_aug, en_aug, en_llm_gec_aug, en_llm_paraphrase_aug, en_qcpg_0.2_aug, en_qcpg_0.2_llm_gec_aug, en_qcpg_0.5_aug, en_qcpg_0.5_llm_gec_aug, en_qcpg_0.8_aug, en_qcpg_0.8_llm_gec_aug
Languages tha, eng
Tasks Question Answering
License MIT (mit)
Homepage https://github.com/parinzee/cross-lingual-data-augmentation-for-thai-qa
HF URL https://huggingface.co/datasets/parinzee/claq-qa-thai-dataset
Paper URL https://aclanthology.org/2023.genbench-1.16/
@SamuelCahyawijaya SamuelCahyawijaya converted this from a draft issue Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant