Skip to content

Latest commit

 

History

History
 
 

eng-bat

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): lav lit ltg prg_Latn sgs
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2017-enlv-englav.eng.lav 22.4 0.533
newsdev2019-enlt-englit.eng.lit 19.5 0.520
newstest2017-enlv-englav.eng.lav 17.3 0.493
newstest2019-enlt-englit.eng.lit 12.7 0.453
Tatoeba-test.eng-lav.eng.lav 40.4 0.637
Tatoeba-test.eng-lit.eng.lit 35.1 0.634
Tatoeba-test.eng.multi 33.9 0.596
Tatoeba-test.eng-prg.eng.prg 0.2 0.110
Tatoeba-test.eng-sgs.eng.sgs 1.5 0.136

opus-2020-07-26.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): lav lit ltg prg_Latn sgs
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-26.zip
  • test set translations: opus-2020-07-26.test.txt
  • test set scores: opus-2020-07-26.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2017-enlv-englav.eng.lav 22.8 0.533
newsdev2019-enlt-englit.eng.lit 19.4 0.518
newstest2017-enlv-englav.eng.lav 17.2 0.493
newstest2019-enlt-englit.eng.lit 13.1 0.456
Tatoeba-test.eng-lav.eng.lav 41.2 0.636
Tatoeba-test.eng-lit.eng.lit 34.6 0.631
Tatoeba-test.eng.multi 35.1 0.599
Tatoeba-test.eng-prg.eng.prg 0.5 0.130
Tatoeba-test.eng-sgs.eng.sgs 3.8 0.192

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): lav lit ltg prg_Latn sgs
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2017-enlv-englav.eng.lav 24.0 0.546
newsdev2019-enlt-englit.eng.lit 20.9 0.533
newstest2017-enlv-englav.eng.lav 18.3 0.506
newstest2019-enlt-englit.eng.lit 13.6 0.466
Tatoeba-test.eng-lav.eng.lav 42.8 0.652
Tatoeba-test.eng-lit.eng.lit 37.1 0.650
Tatoeba-test.eng.multi 37.0 0.616
Tatoeba-test.eng-prg.eng.prg 0.5 0.130
Tatoeba-test.eng-sgs.eng.sgs 4.1 0.178

opus1m+bt-2021-04-10.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): lav lit ltg prg sgs
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>lav<< >>lit<< >>ltg<< >>ndf<< >>olt<< >>prg<< >>prg_Latn<< >>sgs<< >>svx<< >>sxl<< >>xcu<< >>xgl<< >>xsv<< >>xzm<<
  • download: opus1m+bt-2021-04-10.zip
  • test set translations: opus1m+bt-2021-04-10.test.txt
  • test set scores: opus1m+bt-2021-04-10.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
newsdev2017-enlv.eng-lav 22.7 0.534 2003 41503 0.985
newsdev2019-enlt.eng-lit 19.4 0.518 2000 40116 1.000
newstest2017-enlv.eng-lav 17.1 0.493 2001 39434 1.000
newstest2019-enlt.eng-lit 12.9 0.454 998 20091 1.000
Tatoeba-test.eng-lav 40.1 0.636 1631 9927 0.980
Tatoeba-test.eng-lit 33.8 0.625 2500 14791 0.931
Tatoeba-test.eng-ltg 10.7 0.392 1 4 1.000
Tatoeba-test.eng-multi 34.8 0.600 4396 26417 0.951
Tatoeba-test.eng-prg 0.4 0.128 213 1527 0.928
Tatoeba-test.eng-sgs 4.2 0.207 52 160 1.000