opus-2020-06-28.zip

dataset: opus
model: transformer
source language(s): eng
target language(s): acm afb amh apc apc_Latn ara ara_Latn arq arq_Latn ary arz heb mlt phn_Phnx tir tmr_Hebr
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
download: opus-2020-06-28.zip
test set translations: opus-2020-06-28.test.txt
test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset	BLEU	chr-F
Tatoeba-test.eng-amh.eng.amh	10.8	0.499
Tatoeba-test.eng-ara.eng.ara	12.6	0.421
Tatoeba-test.eng-heb.eng.heb	32.6	0.557
Tatoeba-test.eng-mlt.eng.mlt	17.9	0.552
Tatoeba-test.eng.multi	22.8	0.487
Tatoeba-test.eng-phn.eng.phn	0.5	0.003
Tatoeba-test.eng-tir.eng.tir	2.5	0.239
Tatoeba-test.eng-tmr.eng.tmr	0.8	0.003

opus-2020-07-06.zip

dataset: opus
model: transformer
source language(s): eng
target language(s): acm afb amh apc ara arq ary arz heb mlt tir
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
download: opus-2020-07-06.zip
test set translations: opus-2020-07-06.test.txt
test set scores: opus-2020-07-06.eval.txt

Benchmarks

testset	BLEU	chr-F
Tatoeba-test.eng-amh.eng.amh	10.7	0.465
Tatoeba-test.eng-ara.eng.ara	11.7	0.412
Tatoeba-test.eng-heb.eng.heb	32.3	0.552
Tatoeba-test.eng-mlt.eng.mlt	17.7	0.544
Tatoeba-test.eng.multi	22.2	0.481
Tatoeba-test.eng-tir.eng.tir	2.6	0.236

opus-2020-07-27.zip

dataset: opus
model: transformer
source language(s): eng
target language(s): acm afb amh apc ara arq ary arz heb mlt tir
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
download: opus-2020-07-27.zip
test set translations: opus-2020-07-27.test.txt
test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset	BLEU	chr-F
Tatoeba-test.eng-amh.eng.amh	11.0	0.504
Tatoeba-test.eng-ara.eng.ara	12.2	0.412
Tatoeba-test.eng-heb.eng.heb	32.7	0.556
Tatoeba-test.eng-mlt.eng.mlt	17.5	0.548
Tatoeba-test.eng.multi	22.7	0.480
Tatoeba-test.eng-tir.eng.tir	2.4	0.240

opus2m-2020-08-01.zip

dataset: opus2m
model: transformer
source language(s): eng
target language(s): acm afb amh apc ara arq ary arz heb mlt tir
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
download: opus2m-2020-08-01.zip
test set translations: opus2m-2020-08-01.test.txt
test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset	BLEU	chr-F
Tatoeba-test.eng-amh.eng.amh	11.2	0.480
Tatoeba-test.eng-ara.eng.ara	12.7	0.417
Tatoeba-test.eng-heb.eng.heb	33.8	0.564
Tatoeba-test.eng-mlt.eng.mlt	18.7	0.554
Tatoeba-test.eng.multi	23.5	0.486
Tatoeba-test.eng-tir.eng.tir	2.7	0.248

opus1m+bt-2021-04-10.zip

dataset: opus1m+bt
model: transformer-align
source language(s): eng
target language(s): acm afb amh apc ara arq ary arz heb jpa mlt oar phn tir tmr
model: transformer-align
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
valid language labels: >>aao<< >>abh<< >>abv<< >>acm<< >>acq<< >>acw<< >>acx<< >>acy<< >>adf<< >>aeb<< >>aec<< >>afb<< >>agj<< >>aii<< >>aij<< >>ajp<< >>ajt<< >>aju<< >>akk<< >>amh<< >>amw<< >>apc<< >>apd<< >>ara<< >>arb<< >>arc<< >>arq<< >>ars<< >>ary<< >>arz<< >>auz<< >>avl<< >>ayh<< >>ayl<< >>ayn<< >>ayp<< >>bbz<< >>bhm<< >>bhn<< >>bjf<< >>cld<< >>dlk<< >>gdq<< >>gez<< >>gft<< >>gru<< >>har<< >>hbo<< >>heb<< >>hoh<< >>hrt<< >>hss<< >>huy<< >>inm<< >>ior<< >>jpa<< >>jpa_Hebr<< >>jrb<< >>jye<< >>kcn<< >>kqd<< >>lhs<< >>lsd<< >>mey<< >>mid<< >>mlt<< >>mvz<< >>mys<< >>myz<< >>oar<< >>oar_Hebr<< >>oar_Syrc<< >>pga<< >>phn<< >>phn_Phnx<< >>rzh<< >>sam<< >>sgw<< >>shu<< >>shv<< >>smp<< >>sqr<< >>sqt<< >>ssh<< >>stv<< >>syc<< >>syn<< >>tig<< >>tir<< >>tmr<< >>tmr_Hebr<< >>trg<< >>tru<< >>uga<< >>wle<< >>xaa<< >>xeb<< >>xhd<< >>xna<< >>xpu<< >>xqt<< >>xsa<< >>yhd<< >>yud<< >>zwa<<
download: opus1m+bt-2021-04-10.zip
test set translations: opus1m+bt-2021-04-10.test.txt
test set scores: opus1m+bt-2021-04-10.eval.txt

Benchmarks

testset	BLEU	chr-F	#sent	#words	BP
Tatoeba-test.eng-acm	0.8	0.005	3	17	1.000
Tatoeba-test.eng-afb	0.2	0.005	36	145	1.000
Tatoeba-test.eng-amh	0.0	0.000	190	585	1.000
Tatoeba-test.eng-apc	3.5	0.007	5	18	1.000
Tatoeba-test.eng-ara	12.6	0.419	10000	58929	1.000
Tatoeba-test.eng-arq	0.6	0.158	403	2271	1.000
Tatoeba-test.eng-ary	0.5	0.006	18	53	1.000
Tatoeba-test.eng-arz	0.9	0.139	181	856	1.000
Tatoeba-test.eng-heb	32.6	0.556	10000	60344	1.000
Tatoeba-test.eng-jpa	2.3	0.014	4	22	1.000
Tatoeba-test.eng-mlt	13.4	0.476	203	899	1.000
Tatoeba-test.eng-multi	22.6	0.484	10000	59379	1.000
Tatoeba-test.eng-oar	1.3	0.014	6	59	1.000
Tatoeba-test.eng-oar_Hebr	1.8	0.011	3	33	1.000
Tatoeba-test.eng-oar_Syrc	2.8	0.019	3	26	1.000
Tatoeba-test.eng-phn	1.2	0.007	5	33	1.000
Tatoeba-test.eng-tir	0.1	0.010	69	318	1.000
Tatoeba-test.eng-tmr	0.3	0.007	19	95	1.000
tico19-test.eng-amh	0.6	0.029	2100	44943	1.000
tico19-test.eng-ara	16.9	0.479	2100	51336	0.989
tico19-test.eng-tir	0.6	0.035	2100	46792	1.000
tico19-test.en-ti_ER.eng-tir	0.5	0.032	2100	49816	1.000
tico19-test.en-ti_ET.eng-tir	0.6	0.035	2100	49071	1.000

opus4m+btTCv20210807-2021-09-30.zip

dataset: opus4m+btTCv20210807
model: transformer
source language(s): eng
target language(s): acm afb amh apc ara arc arq ary arz hbo heb jpa mlt oar phn syr tig tir tmr
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
valid language labels: >>aao<< >>abh<< >>abv<< >>acm<< >>acq<< >>acw<< >>acx<< >>acy<< >>adf<< >>aeb<< >>aec<< >>afb<< >>agj<< >>aii<< >>aij<< >>ajp<< >>ajt<< >>aju<< >>akk<< >>amh<< >>amw<< >>apc<< >>apd<< >>ara<< >>arb<< >>arc<< >>arq<< >>ars<< >>ary<< >>arz<< >>auz<< >>avl<< >>ayh<< >>ayl<< >>ayn<< >>ayp<< >>bbz<< >>bhm<< >>bhn<< >>bjf<< >>cld<< >>dlk<< >>gdq<< >>gez<< >>gft<< >>gru<< >>har<< >>hbo<< >>heb<< >>hoh<< >>hrt<< >>hss<< >>huy<< >>inm<< >>ior<< >>jpa<< >>jpa_Hebr<< >>jrb<< >>jye<< >>kcn<< >>kqd<< >>lhs<< >>lsd<< >>mey<< >>mid<< >>mlt<< >>mvz<< >>mys<< >>myz<< >>oar<< >>oar_Hebr<< >>oar_Syrc<< >>pga<< >>phn<< >>phn_Phnx<< >>rzh<< >>sam<< >>sgw<< >>shu<< >>shv<< >>smp<< >>sqr<< >>sqt<< >>ssh<< >>stv<< >>syc<< >>syn<< >>tig<< >>tir<< >>tmr<< >>tmr_Hebr<< >>trg<< >>tru<< >>uga<< >>wle<< >>xaa<< >>xeb<< >>xhd<< >>xna<< >>xpu<< >>xqt<< >>xsa<< >>yhd<< >>yud<< >>zwa<<
download: opus4m+btTCv20210807-2021-09-30.zip
test set translations: opus4m+btTCv20210807-2021-09-30.test.txt
test set scores: opus4m+btTCv20210807-2021-09-30.eval.txt

Benchmarks

testset	BLEU	chr-F	#sent	#words	BP
Tatoeba-test-v2021-08-07.eng-multi	23.1	0.502	10000	59933	1.000
Tatoeba-test-v2021-08-07.multi-multi	23.1	0.502	10000	59933	1.000
tico19-test.eng-amh	1.2	0.041	2100	44943	1.000
tico19-test.eng-ara	23.5	0.538	2100	51336	0.994
tico19-test.eng-tir	1.6	0.062	2100	46792	1.000
tico19-test.en-ti_ER.eng-tir	1.6	0.062	2100	49816	1.000
tico19-test.en-ti_ET.eng-tir	1.7	0.066	2100	49071	1.000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

opus-2020-06-28.zip

Benchmarks

opus-2020-07-06.zip

Benchmarks

opus-2020-07-27.zip

Benchmarks

opus2m-2020-08-01.zip

Benchmarks

opus1m+bt-2021-04-10.zip

Benchmarks

opus4m+btTCv20210807-2021-09-30.zip

Benchmarks

Files

README.md

Latest commit

History

README.md

File metadata and controls

opus-2020-06-28.zip

Benchmarks

opus-2020-07-06.zip

Benchmarks

opus-2020-07-27.zip

Benchmarks

opus2m-2020-08-01.zip

Benchmarks

opus1m+bt-2021-04-10.zip

Benchmarks

opus4m+btTCv20210807-2021-09-30.zip

Benchmarks