Create dataset loader for Cross-Lingual Data Augmentation For Thai QA #704

SamuelCahyawijaya · 2024-07-30T15:39:40Z

Dataloader name: cross_lingual_augmented_thai_qa/cross_lingual_augmented_thai_qa.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?cross_lingual_augmented_thai_qa

Dataset	cross_lingual_augmented_thai_qa
Description	This paper presents an innovative data augmentation framework with data quality control designed to enhance the robustness of Question Answering (QA) models in low-resource languages, particularly Thai. Recognizing the challenges posed by the scarcity and quality of training data, we leverage data augmentation techniques in both monolingual and cross-lingual settings. Our approach augments and enriches the original dataset, thereby increasing its linguistic diversity and robustness. We evaluate the robustness of our framework on Machine Reading Comprehension, and the experimental results illustrate the potential of data augmentation to effectively increase training data and improve model generalization in low-resource language settings, offering a promising direction for the data augmentation manner.
Subsets	th_fasttext_aug, th_llm_gec_aug, th_llm_paraphrase_aug, th_ltw2v_aug, th_qcpg_0.2_aug, th_qcpg_0.2_llm_gec_aug, th_qcpg_0.5_aug, th_qcpg_0.5_llm_gec_aug, th_qcpg_0.8_aug, th_qcpg_0.8_llm_gec_aug, th_thai2fit_aug, th_thai2trans_aug, th_wordnet_aug, en_aug, en_llm_gec_aug, en_llm_paraphrase_aug, en_qcpg_0.2_aug, en_qcpg_0.2_llm_gec_aug, en_qcpg_0.5_aug, en_qcpg_0.5_llm_gec_aug, en_qcpg_0.8_aug, en_qcpg_0.8_llm_gec_aug
Languages	tha, eng
Tasks	Question Answering
License	MIT (mit)
Homepage	https://github.com/parinzee/cross-lingual-data-augmentation-for-thai-qa
HF URL	https://huggingface.co/datasets/parinzee/claq-qa-thai-dataset
Paper URL	https://aclanthology.org/2023.genbench-1.16/

The text was updated successfully, but these errors were encountered:

SamuelCahyawijaya added this to SEACrowd Data Hub Jul 30, 2024

SamuelCahyawijaya converted this from a draft issue Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create dataset loader for Cross-Lingual Data Augmentation For Thai QA #704

Create dataset loader for Cross-Lingual Data Augmentation For Thai QA #704

SamuelCahyawijaya commented Jul 30, 2024

Create dataset loader for Cross-Lingual Data Augmentation For Thai QA #704

Create dataset loader for Cross-Lingual Data Augmentation For Thai QA #704

Comments

SamuelCahyawijaya commented Jul 30, 2024