Skip to content

Latest commit

 

History

History

eng-sem

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): acm afb amh apc apc_Latn ara ara_Latn arq arq_Latn ary arz heb mlt phn_Phnx tir tmr_Hebr
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-amh.eng.amh 10.8 0.499
Tatoeba-test.eng-ara.eng.ara 12.6 0.421
Tatoeba-test.eng-heb.eng.heb 32.6 0.557
Tatoeba-test.eng-mlt.eng.mlt 17.9 0.552
Tatoeba-test.eng.multi 22.8 0.487
Tatoeba-test.eng-phn.eng.phn 0.5 0.003
Tatoeba-test.eng-tir.eng.tir 2.5 0.239
Tatoeba-test.eng-tmr.eng.tmr 0.8 0.003

opus-2020-07-06.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): acm afb amh apc ara arq ary arz heb mlt tir
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-06.zip
  • test set translations: opus-2020-07-06.test.txt
  • test set scores: opus-2020-07-06.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-amh.eng.amh 10.7 0.465
Tatoeba-test.eng-ara.eng.ara 11.7 0.412
Tatoeba-test.eng-heb.eng.heb 32.3 0.552
Tatoeba-test.eng-mlt.eng.mlt 17.7 0.544
Tatoeba-test.eng.multi 22.2 0.481
Tatoeba-test.eng-tir.eng.tir 2.6 0.236

opus-2020-07-27.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): acm afb amh apc ara arq ary arz heb mlt tir
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-27.zip
  • test set translations: opus-2020-07-27.test.txt
  • test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-amh.eng.amh 11.0 0.504
Tatoeba-test.eng-ara.eng.ara 12.2 0.412
Tatoeba-test.eng-heb.eng.heb 32.7 0.556
Tatoeba-test.eng-mlt.eng.mlt 17.5 0.548
Tatoeba-test.eng.multi 22.7 0.480
Tatoeba-test.eng-tir.eng.tir 2.4 0.240

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): acm afb amh apc ara arq ary arz heb mlt tir
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-amh.eng.amh 11.2 0.480
Tatoeba-test.eng-ara.eng.ara 12.7 0.417
Tatoeba-test.eng-heb.eng.heb 33.8 0.564
Tatoeba-test.eng-mlt.eng.mlt 18.7 0.554
Tatoeba-test.eng.multi 23.5 0.486
Tatoeba-test.eng-tir.eng.tir 2.7 0.248

opus1m+bt-2021-04-10.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): acm afb amh apc ara arq ary arz heb jpa mlt oar phn tir tmr
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>aao<< >>abh<< >>abv<< >>acm<< >>acq<< >>acw<< >>acx<< >>acy<< >>adf<< >>aeb<< >>aec<< >>afb<< >>agj<< >>aii<< >>aij<< >>ajp<< >>ajt<< >>aju<< >>akk<< >>amh<< >>amw<< >>apc<< >>apd<< >>ara<< >>arb<< >>arc<< >>arq<< >>ars<< >>ary<< >>arz<< >>auz<< >>avl<< >>ayh<< >>ayl<< >>ayn<< >>ayp<< >>bbz<< >>bhm<< >>bhn<< >>bjf<< >>cld<< >>dlk<< >>gdq<< >>gez<< >>gft<< >>gru<< >>har<< >>hbo<< >>heb<< >>hoh<< >>hrt<< >>hss<< >>huy<< >>inm<< >>ior<< >>jpa<< >>jpa_Hebr<< >>jrb<< >>jye<< >>kcn<< >>kqd<< >>lhs<< >>lsd<< >>mey<< >>mid<< >>mlt<< >>mvz<< >>mys<< >>myz<< >>oar<< >>oar_Hebr<< >>oar_Syrc<< >>pga<< >>phn<< >>phn_Phnx<< >>rzh<< >>sam<< >>sgw<< >>shu<< >>shv<< >>smp<< >>sqr<< >>sqt<< >>ssh<< >>stv<< >>syc<< >>syn<< >>tig<< >>tir<< >>tmr<< >>tmr_Hebr<< >>trg<< >>tru<< >>uga<< >>wle<< >>xaa<< >>xeb<< >>xhd<< >>xna<< >>xpu<< >>xqt<< >>xsa<< >>yhd<< >>yud<< >>zwa<<
  • download: opus1m+bt-2021-04-10.zip
  • test set translations: opus1m+bt-2021-04-10.test.txt
  • test set scores: opus1m+bt-2021-04-10.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
Tatoeba-test.eng-acm 0.8 0.005 3 17 1.000
Tatoeba-test.eng-afb 0.2 0.005 36 145 1.000
Tatoeba-test.eng-amh 0.0 0.000 190 585 1.000
Tatoeba-test.eng-apc 3.5 0.007 5 18 1.000
Tatoeba-test.eng-ara 12.6 0.419 10000 58929 1.000
Tatoeba-test.eng-arq 0.6 0.158 403 2271 1.000
Tatoeba-test.eng-ary 0.5 0.006 18 53 1.000
Tatoeba-test.eng-arz 0.9 0.139 181 856 1.000
Tatoeba-test.eng-heb 32.6 0.556 10000 60344 1.000
Tatoeba-test.eng-jpa 2.3 0.014 4 22 1.000
Tatoeba-test.eng-mlt 13.4 0.476 203 899 1.000
Tatoeba-test.eng-multi 22.6 0.484 10000 59379 1.000
Tatoeba-test.eng-oar 1.3 0.014 6 59 1.000
Tatoeba-test.eng-oar_Hebr 1.8 0.011 3 33 1.000
Tatoeba-test.eng-oar_Syrc 2.8 0.019 3 26 1.000
Tatoeba-test.eng-phn 1.2 0.007 5 33 1.000
Tatoeba-test.eng-tir 0.1 0.010 69 318 1.000
Tatoeba-test.eng-tmr 0.3 0.007 19 95 1.000
tico19-test.eng-amh 0.6 0.029 2100 44943 1.000
tico19-test.eng-ara 16.9 0.479 2100 51336 0.989
tico19-test.eng-tir 0.6 0.035 2100 46792 1.000
tico19-test.en-ti_ER.eng-tir 0.5 0.032 2100 49816 1.000
tico19-test.en-ti_ET.eng-tir 0.6 0.035 2100 49071 1.000

opus4m+btTCv20210807-2021-09-30.zip

  • dataset: opus4m+btTCv20210807
  • model: transformer
  • source language(s): eng
  • target language(s): acm afb amh apc ara arc arq ary arz hbo heb jpa mlt oar phn syr tig tir tmr
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>aao<< >>abh<< >>abv<< >>acm<< >>acq<< >>acw<< >>acx<< >>acy<< >>adf<< >>aeb<< >>aec<< >>afb<< >>agj<< >>aii<< >>aij<< >>ajp<< >>ajt<< >>aju<< >>akk<< >>amh<< >>amw<< >>apc<< >>apd<< >>ara<< >>arb<< >>arc<< >>arq<< >>ars<< >>ary<< >>arz<< >>auz<< >>avl<< >>ayh<< >>ayl<< >>ayn<< >>ayp<< >>bbz<< >>bhm<< >>bhn<< >>bjf<< >>cld<< >>dlk<< >>gdq<< >>gez<< >>gft<< >>gru<< >>har<< >>hbo<< >>heb<< >>hoh<< >>hrt<< >>hss<< >>huy<< >>inm<< >>ior<< >>jpa<< >>jpa_Hebr<< >>jrb<< >>jye<< >>kcn<< >>kqd<< >>lhs<< >>lsd<< >>mey<< >>mid<< >>mlt<< >>mvz<< >>mys<< >>myz<< >>oar<< >>oar_Hebr<< >>oar_Syrc<< >>pga<< >>phn<< >>phn_Phnx<< >>rzh<< >>sam<< >>sgw<< >>shu<< >>shv<< >>smp<< >>sqr<< >>sqt<< >>ssh<< >>stv<< >>syc<< >>syn<< >>tig<< >>tir<< >>tmr<< >>tmr_Hebr<< >>trg<< >>tru<< >>uga<< >>wle<< >>xaa<< >>xeb<< >>xhd<< >>xna<< >>xpu<< >>xqt<< >>xsa<< >>yhd<< >>yud<< >>zwa<<
  • download: opus4m+btTCv20210807-2021-09-30.zip
  • test set translations: opus4m+btTCv20210807-2021-09-30.test.txt
  • test set scores: opus4m+btTCv20210807-2021-09-30.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
Tatoeba-test-v2021-08-07.eng-multi 23.1 0.502 10000 59933 1.000
Tatoeba-test-v2021-08-07.multi-multi 23.1 0.502 10000 59933 1.000
tico19-test.eng-amh 1.2 0.041 2100 44943 1.000
tico19-test.eng-ara 23.5 0.538 2100 51336 0.994
tico19-test.eng-tir 1.6 0.062 2100 46792 1.000
tico19-test.en-ti_ER.eng-tir 1.6 0.062 2100 49816 1.000
tico19-test.en-ti_ET.eng-tir 1.7 0.066 2100 49071 1.000