A tool that locates, downloads, and extracts machine translation corpora
-
Updated
May 25, 2024 - Python
A tool that locates, downloads, and extracts machine translation corpora
Large-scale, distributed, sparse linear algebra in Julia.
PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation (EMNLP 2021)
Code and data for the EMNLP 2020 paper: "Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank"
A Telegram Bot for Amharic Speech Data Collection
Wikipedia-Vikidia Corpus (WiViCo) - A general-purpose parallel sentence simplification dataset for French
Add a description, image, and links to the parallel-data topic page so that developers can more easily learn about it.
To associate your repository with the parallel-data topic, visit your repo's landing page and select "manage topics."