Folders and files Name Name Last commit message
Last commit date
parent directory
View all files
dataset: opus
model: transformer
source language(s): eng
target language(s): acm afb amh apc apc_Latn ara ara_Latn arq arq_Latn ary arz heb mlt phn_Phnx tir tmr_Hebr
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
download: opus-2020-06-28.zip
test set translations: opus-2020-06-28.test.txt
test set scores: opus-2020-06-28.eval.txt
testset
BLEU
chr-F
Tatoeba-test.eng-amh.eng.amh
10.8
0.499
Tatoeba-test.eng-ara.eng.ara
12.6
0.421
Tatoeba-test.eng-heb.eng.heb
32.6
0.557
Tatoeba-test.eng-mlt.eng.mlt
17.9
0.552
Tatoeba-test.eng.multi
22.8
0.487
Tatoeba-test.eng-phn.eng.phn
0.5
0.003
Tatoeba-test.eng-tir.eng.tir
2.5
0.239
Tatoeba-test.eng-tmr.eng.tmr
0.8
0.003
dataset: opus
model: transformer
source language(s): eng
target language(s): acm afb amh apc ara arq ary arz heb mlt tir
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
download: opus-2020-07-06.zip
test set translations: opus-2020-07-06.test.txt
test set scores: opus-2020-07-06.eval.txt
testset
BLEU
chr-F
Tatoeba-test.eng-amh.eng.amh
10.7
0.465
Tatoeba-test.eng-ara.eng.ara
11.7
0.412
Tatoeba-test.eng-heb.eng.heb
32.3
0.552
Tatoeba-test.eng-mlt.eng.mlt
17.7
0.544
Tatoeba-test.eng.multi
22.2
0.481
Tatoeba-test.eng-tir.eng.tir
2.6
0.236
dataset: opus
model: transformer
source language(s): eng
target language(s): acm afb amh apc ara arq ary arz heb mlt tir
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
download: opus-2020-07-27.zip
test set translations: opus-2020-07-27.test.txt
test set scores: opus-2020-07-27.eval.txt
testset
BLEU
chr-F
Tatoeba-test.eng-amh.eng.amh
11.0
0.504
Tatoeba-test.eng-ara.eng.ara
12.2
0.412
Tatoeba-test.eng-heb.eng.heb
32.7
0.556
Tatoeba-test.eng-mlt.eng.mlt
17.5
0.548
Tatoeba-test.eng.multi
22.7
0.480
Tatoeba-test.eng-tir.eng.tir
2.4
0.240
dataset: opus2m
model: transformer
source language(s): eng
target language(s): acm afb amh apc ara arq ary arz heb mlt tir
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
download: opus2m-2020-08-01.zip
test set translations: opus2m-2020-08-01.test.txt
test set scores: opus2m-2020-08-01.eval.txt
testset
BLEU
chr-F
Tatoeba-test.eng-amh.eng.amh
11.2
0.480
Tatoeba-test.eng-ara.eng.ara
12.7
0.417
Tatoeba-test.eng-heb.eng.heb
33.8
0.564
Tatoeba-test.eng-mlt.eng.mlt
18.7
0.554
Tatoeba-test.eng.multi
23.5
0.486
Tatoeba-test.eng-tir.eng.tir
2.7
0.248
dataset: opus1m+bt
model: transformer-align
source language(s): eng
target language(s): acm afb amh apc ara arq ary arz heb jpa mlt oar phn tir tmr
model: transformer-align
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
valid language labels: >>aao<< >>abh<< >>abv<< >>acm<< >>acq<< >>acw<< >>acx<< >>acy<< >>adf<< >>aeb<< >>aec<< >>afb<< >>agj<< >>aii<< >>aij<< >>ajp<< >>ajt<< >>aju<< >>akk<< >>amh<< >>amw<< >>apc<< >>apd<< >>ara<< >>arb<< >>arc<< >>arq<< >>ars<< >>ary<< >>arz<< >>auz<< >>avl<< >>ayh<< >>ayl<< >>ayn<< >>ayp<< >>bbz<< >>bhm<< >>bhn<< >>bjf<< >>cld<< >>dlk<< >>gdq<< >>gez<< >>gft<< >>gru<< >>har<< >>hbo<< >>heb<< >>hoh<< >>hrt<< >>hss<< >>huy<< >>inm<< >>ior<< >>jpa<< >>jpa_Hebr<< >>jrb<< >>jye<< >>kcn<< >>kqd<< >>lhs<< >>lsd<< >>mey<< >>mid<< >>mlt<< >>mvz<< >>mys<< >>myz<< >>oar<< >>oar_Hebr<< >>oar_Syrc<< >>pga<< >>phn<< >>phn_Phnx<< >>rzh<< >>sam<< >>sgw<< >>shu<< >>shv<< >>smp<< >>sqr<< >>sqt<< >>ssh<< >>stv<< >>syc<< >>syn<< >>tig<< >>tir<< >>tmr<< >>tmr_Hebr<< >>trg<< >>tru<< >>uga<< >>wle<< >>xaa<< >>xeb<< >>xhd<< >>xna<< >>xpu<< >>xqt<< >>xsa<< >>yhd<< >>yud<< >>zwa<<
download: opus1m+bt-2021-04-10.zip
test set translations: opus1m+bt-2021-04-10.test.txt
test set scores: opus1m+bt-2021-04-10.eval.txt
testset
BLEU
chr-F
#sent
#words
BP
Tatoeba-test.eng-acm
0.8
0.005
3
17
1.000
Tatoeba-test.eng-afb
0.2
0.005
36
145
1.000
Tatoeba-test.eng-amh
0.0
0.000
190
585
1.000
Tatoeba-test.eng-apc
3.5
0.007
5
18
1.000
Tatoeba-test.eng-ara
12.6
0.419
10000
58929
1.000
Tatoeba-test.eng-arq
0.6
0.158
403
2271
1.000
Tatoeba-test.eng-ary
0.5
0.006
18
53
1.000
Tatoeba-test.eng-arz
0.9
0.139
181
856
1.000
Tatoeba-test.eng-heb
32.6
0.556
10000
60344
1.000
Tatoeba-test.eng-jpa
2.3
0.014
4
22
1.000
Tatoeba-test.eng-mlt
13.4
0.476
203
899
1.000
Tatoeba-test.eng-multi
22.6
0.484
10000
59379
1.000
Tatoeba-test.eng-oar
1.3
0.014
6
59
1.000
Tatoeba-test.eng-oar_Hebr
1.8
0.011
3
33
1.000
Tatoeba-test.eng-oar_Syrc
2.8
0.019
3
26
1.000
Tatoeba-test.eng-phn
1.2
0.007
5
33
1.000
Tatoeba-test.eng-tir
0.1
0.010
69
318
1.000
Tatoeba-test.eng-tmr
0.3
0.007
19
95
1.000
tico19-test.eng-amh
0.6
0.029
2100
44943
1.000
tico19-test.eng-ara
16.9
0.479
2100
51336
0.989
tico19-test.eng-tir
0.6
0.035
2100
46792
1.000
tico19-test.en-ti_ER.eng-tir
0.5
0.032
2100
49816
1.000
tico19-test.en-ti_ET.eng-tir
0.6
0.035
2100
49071
1.000
opus4m+btTCv20210807-2021-09-30.zip
dataset: opus4m+btTCv20210807
model: transformer
source language(s): eng
target language(s): acm afb amh apc ara arc arq ary arz hbo heb jpa mlt oar phn syr tig tir tmr
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
valid language labels: >>aao<< >>abh<< >>abv<< >>acm<< >>acq<< >>acw<< >>acx<< >>acy<< >>adf<< >>aeb<< >>aec<< >>afb<< >>agj<< >>aii<< >>aij<< >>ajp<< >>ajt<< >>aju<< >>akk<< >>amh<< >>amw<< >>apc<< >>apd<< >>ara<< >>arb<< >>arc<< >>arq<< >>ars<< >>ary<< >>arz<< >>auz<< >>avl<< >>ayh<< >>ayl<< >>ayn<< >>ayp<< >>bbz<< >>bhm<< >>bhn<< >>bjf<< >>cld<< >>dlk<< >>gdq<< >>gez<< >>gft<< >>gru<< >>har<< >>hbo<< >>heb<< >>hoh<< >>hrt<< >>hss<< >>huy<< >>inm<< >>ior<< >>jpa<< >>jpa_Hebr<< >>jrb<< >>jye<< >>kcn<< >>kqd<< >>lhs<< >>lsd<< >>mey<< >>mid<< >>mlt<< >>mvz<< >>mys<< >>myz<< >>oar<< >>oar_Hebr<< >>oar_Syrc<< >>pga<< >>phn<< >>phn_Phnx<< >>rzh<< >>sam<< >>sgw<< >>shu<< >>shv<< >>smp<< >>sqr<< >>sqt<< >>ssh<< >>stv<< >>syc<< >>syn<< >>tig<< >>tir<< >>tmr<< >>tmr_Hebr<< >>trg<< >>tru<< >>uga<< >>wle<< >>xaa<< >>xeb<< >>xhd<< >>xna<< >>xpu<< >>xqt<< >>xsa<< >>yhd<< >>yud<< >>zwa<<
download: opus4m+btTCv20210807-2021-09-30.zip
test set translations: opus4m+btTCv20210807-2021-09-30.test.txt
test set scores: opus4m+btTCv20210807-2021-09-30.eval.txt
testset
BLEU
chr-F
#sent
#words
BP
Tatoeba-test-v2021-08-07.eng-multi
23.1
0.502
10000
59933
1.000
Tatoeba-test-v2021-08-07.multi-multi
23.1
0.502
10000
59933
1.000
tico19-test.eng-amh
1.2
0.041
2100
44943
1.000
tico19-test.eng-ara
23.5
0.538
2100
51336
0.994
tico19-test.eng-tir
1.6
0.062
2100
46792
1.000
tico19-test.en-ti_ER.eng-tir
1.6
0.062
2100
49816
1.000
tico19-test.en-ti_ET.eng-tir
1.7
0.066
2100
49071
1.000
You can’t perform that action at this time.