-
Notifications
You must be signed in to change notification settings - Fork 49
Dump of Ukranian-centric models #166
Comments
We should add en <-> ukr, but not sure about the others. So far our strategy was to have en <-> models and to support other language pairs through pivoting. It's also easier to access quality this way. |
ok, let's do this way |
I checked the quality of uk -> en in opus-mt-app and it looks decent. Those models produce separate vocabularies for source and target languages, so the issue is that the extension supports only one mixed vocabulary for both languages. So, we have to change the structure of our model registry to be able to use separate vocabularies. |
|
Oh, nice, we have a similar page https://mozilla.github.io/translate/, more demos to the world! :) So it feels like it's worth the effort to add this support to be able to integrate these and future models. Especially if opus folks will decide to run their massive automatic training. |
Also, those models are |
marian supports this use case already. So, I am assuming that the engine (aka bergamot-translator) will work as well as it is built on top of marian. However, we just need to test the engine for this use case just to be sure and then make changes in the extension (which I believe would be easy).
It is technically possible by providing a model config for each language pair. Right now, we are using a global model config for all language pairs. |
@abhi-agg Possibly related: browsermt/marian-dev#81 |
The demo page Kenneth posted is just the bergamot-translator wasm test page with some tweaks. |
Just a heads up to everyone. If this requires landing stuff in gecko (being discussed in browsermt/marian-dev#81 (comment)) then we can't achieve it by end of next week. |
If this depends on touching gecko, forget about it then, it's just too late and I don't want to even think of risking opening another can of worms. |
I agree we can think about it later. Adding those models requires updating too many parts: model registry and repo format, evaluation scripts, loading scripts in the extension and other places (translate website, HTTP service) + now maybe some work in gecko. We should do it at some point, especially if more models with the same format will be available. |
Forget it then. |
A dump of models from Helsinki: https://github.com/Helsinki-NLP/UkrainianLT/blob/main/translateLocally-models.json
"Additional ones are on the way. The quality might be a bit questionable but it is hard for me to judge."
These have some languages where we don't have en support yet, need to revise the pivoting assumptions.
The text was updated successfully, but these errors were encountered: