- dataset: opus
- model: transformer
- source language(s): kan mal tam tel
- target language(s): eng
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- download: opus-2020-06-28.zip
- test set translations: opus-2020-06-28.test.txt
- test set scores: opus-2020-06-28.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.kan-eng.kan.eng | 13.5 | 0.361 |
Tatoeba-test.mal-eng.mal.eng | 42.5 | 0.599 |
Tatoeba-test.multi.eng | 31.5 | 0.510 |
Tatoeba-test.tam-eng.tam.eng | 30.8 | 0.478 |
Tatoeba-test.tel-eng.tel.eng | 17.1 | 0.378 |
- dataset: opus
- model: transformer
- source language(s): kan mal tam tel
- target language(s): eng
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- download: opus-2020-07-26.zip
- test set translations: opus-2020-07-26.test.txt
- test set scores: opus-2020-07-26.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.kan-eng.kan.eng | 8.6 | 0.288 |
Tatoeba-test.mal-eng.mal.eng | 36.8 | 0.533 |
Tatoeba-test.multi.eng | 25.5 | 0.443 |
Tatoeba-test.tam-eng.tam.eng | 22.9 | 0.405 |
Tatoeba-test.tel-eng.tel.eng | 12.2 | 0.322 |
- dataset: opus2m
- model: transformer
- source language(s): kan mal tam tel
- target language(s): eng
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- download: opus2m-2020-08-12.zip
- test set translations: opus2m-2020-08-12.test.txt
- test set scores: opus2m-2020-08-12.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.kan-eng.kan.eng | 9.1 | 0.312 |
Tatoeba-test.mal-eng.mal.eng | 42.0 | 0.584 |
Tatoeba-test.multi.eng | 30.0 | 0.493 |
Tatoeba-test.tam-eng.tam.eng | 30.2 | 0.467 |
Tatoeba-test.tel-eng.tel.eng | 15.9 | 0.378 |
- dataset: opus1m+bt
- model: transformer-align
- source language(s): kan mal tam tel
- target language(s): eng
- model: transformer-align
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- download: opus1m+bt-2021-05-01.zip
- test set translations: opus1m+bt-2021-05-01.test.txt
- test set scores: opus1m+bt-2021-05-01.eval.txt
testset | BLEU | chr-F | #sent | #words | BP |
---|---|---|---|---|---|
Tatoeba-test.kan-eng | 17.8 | 0.404 | 167 | 1252 | 1.000 |
Tatoeba-test.mal-eng | 42.8 | 0.602 | 802 | 5558 | 0.985 |
Tatoeba-test.multi-eng | 32.2 | 0.519 | 1541 | 10641 | 1.000 |
Tatoeba-test.tam-eng | 27.4 | 0.460 | 311 | 2125 | 1.000 |
Tatoeba-test.tel-eng | 19.5 | 0.405 | 261 | 1706 | 1.000 |
tico19-test.tam-eng | 13.6 | 0.380 | 2100 | 56848 | 1.000 |
- dataset: opus4m+btTCv20210807
- model: transformer
- source language(s): kan mal tam tcy tel
- target language(s): eng
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<
(id = valid target language ID) - valid language labels:
- download: opus4m+btTCv20210807-2021-09-30.zip
- test set translations: opus4m+btTCv20210807-2021-09-30.test.txt
- test set scores: opus4m+btTCv20210807-2021-09-30.eval.txt
testset | BLEU | chr-F | #sent | #words | BP |
---|---|---|---|---|---|
Tatoeba-test-v2021-08-07.multi-eng | 34.8 | 0.529 | 1541 | 10641 | 1.000 |
Tatoeba-test-v2021-08-07.multi-multi | 34.8 | 0.529 | 1541 | 10641 | 1.000 |
tico19-test.tam-eng | 22.9 | 0.505 | 2100 | 56848 | 1.000 |