- dataset: opus
- model: transformer
- source language(s): eng
- target language(s): acm afb amh apc apc_Latn ara ara_Latn arq arq_Latn ary arz heb mlt phn_Phnx tir tmr_Hebr
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<
(id = valid target language ID) - download: opus-2020-06-28.zip
- test set translations: opus-2020-06-28.test.txt
- test set scores: opus-2020-06-28.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.eng-amh.eng.amh | 10.8 | 0.499 |
Tatoeba-test.eng-ara.eng.ara | 12.6 | 0.421 |
Tatoeba-test.eng-heb.eng.heb | 32.6 | 0.557 |
Tatoeba-test.eng-mlt.eng.mlt | 17.9 | 0.552 |
Tatoeba-test.eng.multi | 22.8 | 0.487 |
Tatoeba-test.eng-phn.eng.phn | 0.5 | 0.003 |
Tatoeba-test.eng-tir.eng.tir | 2.5 | 0.239 |
Tatoeba-test.eng-tmr.eng.tmr | 0.8 | 0.003 |
- dataset: opus
- model: transformer
- source language(s): eng
- target language(s): acm afb amh apc ara arq ary arz heb mlt tir
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<
(id = valid target language ID) - download: opus-2020-07-06.zip
- test set translations: opus-2020-07-06.test.txt
- test set scores: opus-2020-07-06.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.eng-amh.eng.amh | 10.7 | 0.465 |
Tatoeba-test.eng-ara.eng.ara | 11.7 | 0.412 |
Tatoeba-test.eng-heb.eng.heb | 32.3 | 0.552 |
Tatoeba-test.eng-mlt.eng.mlt | 17.7 | 0.544 |
Tatoeba-test.eng.multi | 22.2 | 0.481 |
Tatoeba-test.eng-tir.eng.tir | 2.6 | 0.236 |
- dataset: opus
- model: transformer
- source language(s): eng
- target language(s): acm afb amh apc ara arq ary arz heb mlt tir
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<
(id = valid target language ID) - download: opus-2020-07-27.zip
- test set translations: opus-2020-07-27.test.txt
- test set scores: opus-2020-07-27.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.eng-amh.eng.amh | 11.0 | 0.504 |
Tatoeba-test.eng-ara.eng.ara | 12.2 | 0.412 |
Tatoeba-test.eng-heb.eng.heb | 32.7 | 0.556 |
Tatoeba-test.eng-mlt.eng.mlt | 17.5 | 0.548 |
Tatoeba-test.eng.multi | 22.7 | 0.480 |
Tatoeba-test.eng-tir.eng.tir | 2.4 | 0.240 |
- dataset: opus2m
- model: transformer
- source language(s): eng
- target language(s): acm afb amh apc ara arq ary arz heb mlt tir
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<
(id = valid target language ID) - download: opus2m-2020-08-01.zip
- test set translations: opus2m-2020-08-01.test.txt
- test set scores: opus2m-2020-08-01.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.eng-amh.eng.amh | 11.2 | 0.480 |
Tatoeba-test.eng-ara.eng.ara | 12.7 | 0.417 |
Tatoeba-test.eng-heb.eng.heb | 33.8 | 0.564 |
Tatoeba-test.eng-mlt.eng.mlt | 18.7 | 0.554 |
Tatoeba-test.eng.multi | 23.5 | 0.486 |
Tatoeba-test.eng-tir.eng.tir | 2.7 | 0.248 |
- dataset: opus1m+bt
- model: transformer-align
- source language(s): eng
- target language(s): acm afb amh apc ara arq ary arz heb jpa mlt oar phn tir tmr
- model: transformer-align
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<
(id = valid target language ID) - valid language labels: >>aao<< >>abh<< >>abv<< >>acm<< >>acq<< >>acw<< >>acx<< >>acy<< >>adf<< >>aeb<< >>aec<< >>afb<< >>agj<< >>aii<< >>aij<< >>ajp<< >>ajt<< >>aju<< >>akk<< >>amh<< >>amw<< >>apc<< >>apd<< >>ara<< >>arb<< >>arc<< >>arq<< >>ars<< >>ary<< >>arz<< >>auz<< >>avl<< >>ayh<< >>ayl<< >>ayn<< >>ayp<< >>bbz<< >>bhm<< >>bhn<< >>bjf<< >>cld<< >>dlk<< >>gdq<< >>gez<< >>gft<< >>gru<< >>har<< >>hbo<< >>heb<< >>hoh<< >>hrt<< >>hss<< >>huy<< >>inm<< >>ior<< >>jpa<< >>jpa_Hebr<< >>jrb<< >>jye<< >>kcn<< >>kqd<< >>lhs<< >>lsd<< >>mey<< >>mid<< >>mlt<< >>mvz<< >>mys<< >>myz<< >>oar<< >>oar_Hebr<< >>oar_Syrc<< >>pga<< >>phn<< >>phn_Phnx<< >>rzh<< >>sam<< >>sgw<< >>shu<< >>shv<< >>smp<< >>sqr<< >>sqt<< >>ssh<< >>stv<< >>syc<< >>syn<< >>tig<< >>tir<< >>tmr<< >>tmr_Hebr<< >>trg<< >>tru<< >>uga<< >>wle<< >>xaa<< >>xeb<< >>xhd<< >>xna<< >>xpu<< >>xqt<< >>xsa<< >>yhd<< >>yud<< >>zwa<<
- download: opus1m+bt-2021-04-10.zip
- test set translations: opus1m+bt-2021-04-10.test.txt
- test set scores: opus1m+bt-2021-04-10.eval.txt
testset | BLEU | chr-F | #sent | #words | BP |
---|---|---|---|---|---|
Tatoeba-test.eng-acm | 0.8 | 0.005 | 3 | 17 | 1.000 |
Tatoeba-test.eng-afb | 0.2 | 0.005 | 36 | 145 | 1.000 |
Tatoeba-test.eng-amh | 0.0 | 0.000 | 190 | 585 | 1.000 |
Tatoeba-test.eng-apc | 3.5 | 0.007 | 5 | 18 | 1.000 |
Tatoeba-test.eng-ara | 12.6 | 0.419 | 10000 | 58929 | 1.000 |
Tatoeba-test.eng-arq | 0.6 | 0.158 | 403 | 2271 | 1.000 |
Tatoeba-test.eng-ary | 0.5 | 0.006 | 18 | 53 | 1.000 |
Tatoeba-test.eng-arz | 0.9 | 0.139 | 181 | 856 | 1.000 |
Tatoeba-test.eng-heb | 32.6 | 0.556 | 10000 | 60344 | 1.000 |
Tatoeba-test.eng-jpa | 2.3 | 0.014 | 4 | 22 | 1.000 |
Tatoeba-test.eng-mlt | 13.4 | 0.476 | 203 | 899 | 1.000 |
Tatoeba-test.eng-multi | 22.6 | 0.484 | 10000 | 59379 | 1.000 |
Tatoeba-test.eng-oar | 1.3 | 0.014 | 6 | 59 | 1.000 |
Tatoeba-test.eng-oar_Hebr | 1.8 | 0.011 | 3 | 33 | 1.000 |
Tatoeba-test.eng-oar_Syrc | 2.8 | 0.019 | 3 | 26 | 1.000 |
Tatoeba-test.eng-phn | 1.2 | 0.007 | 5 | 33 | 1.000 |
Tatoeba-test.eng-tir | 0.1 | 0.010 | 69 | 318 | 1.000 |
Tatoeba-test.eng-tmr | 0.3 | 0.007 | 19 | 95 | 1.000 |
tico19-test.eng-amh | 0.6 | 0.029 | 2100 | 44943 | 1.000 |
tico19-test.eng-ara | 16.9 | 0.479 | 2100 | 51336 | 0.989 |
tico19-test.eng-tir | 0.6 | 0.035 | 2100 | 46792 | 1.000 |
tico19-test.en-ti_ER.eng-tir | 0.5 | 0.032 | 2100 | 49816 | 1.000 |
tico19-test.en-ti_ET.eng-tir | 0.6 | 0.035 | 2100 | 49071 | 1.000 |
- dataset: opus4m+btTCv20210807
- model: transformer
- source language(s): eng
- target language(s): acm afb amh apc ara arc arq ary arz hbo heb jpa mlt oar phn syr tig tir tmr
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<
(id = valid target language ID) - valid language labels: >>aao<< >>abh<< >>abv<< >>acm<< >>acq<< >>acw<< >>acx<< >>acy<< >>adf<< >>aeb<< >>aec<< >>afb<< >>agj<< >>aii<< >>aij<< >>ajp<< >>ajt<< >>aju<< >>akk<< >>amh<< >>amw<< >>apc<< >>apd<< >>ara<< >>arb<< >>arc<< >>arq<< >>ars<< >>ary<< >>arz<< >>auz<< >>avl<< >>ayh<< >>ayl<< >>ayn<< >>ayp<< >>bbz<< >>bhm<< >>bhn<< >>bjf<< >>cld<< >>dlk<< >>gdq<< >>gez<< >>gft<< >>gru<< >>har<< >>hbo<< >>heb<< >>hoh<< >>hrt<< >>hss<< >>huy<< >>inm<< >>ior<< >>jpa<< >>jpa_Hebr<< >>jrb<< >>jye<< >>kcn<< >>kqd<< >>lhs<< >>lsd<< >>mey<< >>mid<< >>mlt<< >>mvz<< >>mys<< >>myz<< >>oar<< >>oar_Hebr<< >>oar_Syrc<< >>pga<< >>phn<< >>phn_Phnx<< >>rzh<< >>sam<< >>sgw<< >>shu<< >>shv<< >>smp<< >>sqr<< >>sqt<< >>ssh<< >>stv<< >>syc<< >>syn<< >>tig<< >>tir<< >>tmr<< >>tmr_Hebr<< >>trg<< >>tru<< >>uga<< >>wle<< >>xaa<< >>xeb<< >>xhd<< >>xna<< >>xpu<< >>xqt<< >>xsa<< >>yhd<< >>yud<< >>zwa<<
- download: opus4m+btTCv20210807-2021-09-30.zip
- test set translations: opus4m+btTCv20210807-2021-09-30.test.txt
- test set scores: opus4m+btTCv20210807-2021-09-30.eval.txt
testset | BLEU | chr-F | #sent | #words | BP |
---|---|---|---|---|---|
Tatoeba-test-v2021-08-07.eng-multi | 23.1 | 0.502 | 10000 | 59933 | 1.000 |
Tatoeba-test-v2021-08-07.multi-multi | 23.1 | 0.502 | 10000 | 59933 | 1.000 |
tico19-test.eng-amh | 1.2 | 0.041 | 2100 | 44943 | 1.000 |
tico19-test.eng-ara | 23.5 | 0.538 | 2100 | 51336 | 0.994 |
tico19-test.eng-tir | 1.6 | 0.062 | 2100 | 46792 | 1.000 |
tico19-test.en-ti_ER.eng-tir | 1.6 | 0.062 | 2100 | 49816 | 1.000 |
tico19-test.en-ti_ET.eng-tir | 1.7 | 0.066 | 2100 | 49071 | 1.000 |