Skip to content
This repository has been archived by the owner on Sep 4, 2023. It is now read-only.

Dump of Ukranian-centric models #166

Closed
kpu opened this issue Mar 17, 2022 · 13 comments · Fixed by #409
Closed

Dump of Ukranian-centric models #166

kpu opened this issue Mar 17, 2022 · 13 comments · Fixed by #409
Assignees
Labels
enhancement New feature or request language request

Comments

@kpu
Copy link
Contributor

kpu commented Mar 17, 2022

A dump of models from Helsinki: https://github.com/Helsinki-NLP/UkrainianLT/blob/main/translateLocally-models.json

"Additional ones are on the way. The quality might be a bit questionable but it is hard for me to judge."

These have some languages where we don't have en support yet, need to revise the pivoting assumptions.

@andrenatal andrenatal added this to the W7 milestone Mar 22, 2022
@eu9ene
Copy link
Collaborator

eu9ene commented Mar 22, 2022

We should add en <-> ukr, but not sure about the others. So far our strategy was to have en <-> models and to support other language pairs through pivoting. It's also easier to access quality this way.

@andrenatal
Copy link
Contributor

ok, let's do this way

@eu9ene
Copy link
Collaborator

eu9ene commented Mar 24, 2022

I checked the quality of uk -> en in opus-mt-app and it looks decent. Those models produce separate vocabularies for source and target languages, so the issue is that the extension supports only one mixed vocabulary for both languages. So, we have to change the structure of our model registry to be able to use separate vocabularies.

@kpu
Copy link
Contributor Author

kpu commented Mar 24, 2022

I checked the quality of uk -> en in opus-mt-app and it looks decent. Those models produce separate vocabularies for source and target languages, so the issue is that the extension supports only one mixed vocabulary for both languages. So, we have to change the structure of our model registry to be able to use separate vocabularies.

Indeed https://translatelocally.com/web/

@eu9ene
Copy link
Collaborator

eu9ene commented Mar 24, 2022

Oh, nice, we have a similar page https://mozilla.github.io/translate/, more demos to the world! :) So it feels like it's worth the effort to add this support to be able to integrate these and future models. Especially if opus folks will decide to run their massive automatic training.

@eu9ene
Copy link
Collaborator

eu9ene commented Mar 25, 2022

Also, those models are gemm-precision: int8, I guess they were created using marian master based on https://github.com/browsermt/students/tree/master/train-student#marian-master-slower. They are supposedly slower. Should we add support of such models to the extension?

@abhi-agg
Copy link
Collaborator

Those models produce separate vocabularies for source and target languages, so the issue is that the extension supports only one mixed vocabulary for both languages. So, we have to change the structure of our model registry to be able to use separate vocabularies.

marian supports this use case already. So, I am assuming that the engine (aka bergamot-translator) will work as well as it is built on top of marian. However, we just need to test the engine for this use case just to be sure and then make changes in the extension (which I believe would be easy).

Also, those models are gemm-precision: int8, I guess they were created using marian master based on https://github.com/browsermt/students/tree/master/train-student#marian-master-slower. They are supposedly slower. Should we add support of such models to the extension?

It is technically possible by providing a model config for each language pair. Right now, we are using a global model config for all language pairs.

@jerinphilip
Copy link

@abhi-agg Possibly related: browsermt/marian-dev#81

@jelmervdl
Copy link
Contributor

The demo page Kenneth posted is just the bergamot-translator wasm test page with some tweaks.

@abhi-agg
Copy link
Collaborator

Just a heads up to everyone. If this requires landing stuff in gecko (being discussed in browsermt/marian-dev#81 (comment)) then we can't achieve it by end of next week.

@andrenatal
Copy link
Contributor

If this depends on touching gecko, forget about it then, it's just too late and I don't want to even think of risking opening another can of worms.

@eu9ene
Copy link
Collaborator

eu9ene commented Mar 25, 2022

I agree we can think about it later. Adding those models requires updating too many parts: model registry and repo format, evaluation scripts, loading scripts in the extension and other places (translate website, HTTP service) + now maybe some work in gecko. We should do it at some point, especially if more models with the same format will be available.

@andrenatal
Copy link
Contributor

Forget it then.

@andrenatal andrenatal removed this from the W7 milestone Mar 25, 2022
@andrenatal andrenatal added the enhancement New feature or request label Mar 25, 2022
@eu9ene eu9ene self-assigned this Jun 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request language request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants