-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wasm test page UI for translating b/w non-English language pairs #231
Wasm test page UI for translating b/w non-English language pairs #231
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is https://github.com/mozilla/translate being duplicated here? What purpose is this serving anyone? Some tiny snippets on how to use the WebAssembly bindings similar to what previously existed I understand to be useful for developers (non-mozilla perhaps).
I'm failing to comprehend why the entire model delivery mechanism and what looks like extension level complexity js (not some reference example) is being brought in.
A package_lock.json
? CSS? Why?
wasm/test_page/js/worker.js
Outdated
} | ||
} | ||
|
||
const translateInvolvingEnglish = (from, to, paragraphs) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're parameterizing this by (from, to). This has nothing to do with English.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you are trying to say here. One of the from
or to
parameter is English for this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing here, but there seems to be undocumented assumptions that any non-English language has to go through English pivoting to obtain translations. Your code should generalize to any pivot (or any sequence of models applied in series to get to the target language although it won't be more than 1 hop). You encode pivoting through English as a specialization of the above general case.
One of the from or to parameter is English for this function
Where is this documented? Or better, where is such a constraint imposed by design? I can throw a hi -> ta
model at this function, and it'll work provided the model is in the inventory. It is general, which is not a bad thing.
Naming is how you communicate to devs if this is a "demonstration" example. While naming is subjective, translateInvolvingEnglish
is definitely wrong as I can throw non-English directions at this. You're adding a confusing abstraction, if the intended target crowd is supposed to be reading. You can indicate this is a private method by means of hiding it from outside within a closure (translate
), which is probably not the best thing to do.
Another possibility to swap the names of translateInvolvingEnglish
(maybe call this translatePivotingThroughEnglish
) and translate
to get the general purpose case, where not through English direct models will also happen to be supported if available in the inventory. There is absolutely no reason to go through English if there's a faster model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code should generalize to any pivot (or any sequence of models applied in series to get to the target language although it won't be more than 1 hop).
It doesn't make sense to me at this point to change the wasm test page to accommodate a general case which currently doesn't exist. I can generalize it once we have trained models that pivot without involving English.
wasm/test_page/js/worker.js
Outdated
if (from !== 'en' && to !== 'en') { | ||
log(`Translating '${from}${to}' via pivoting: '${from}en' -> 'en${to}'`); | ||
let translatedParagraphsInEnglish = translateInvolvingEnglish(from, 'en', paragraphs); | ||
return translateInvolvingEnglish('en', to, translatedParagraphsInEnglish); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works only for plaintext, is not the recommended or intended way to do pivot translation. I'm unsure who communicated this is the solution for mozilla#62 (comment). Going forward with this for HTML will incur serious debt ahead - it won't work with links/tags. But a targeted by the library pivot translation will be compatible with other algorithms like those which require TagTree.
#212 is unsolved and will take more development time after which the proper solution will come about, if at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know that for the links/tags, the current setup wouldn't work. Once the API supports pivoting for html text, this page will be updated accordingly. This is quite clear from the mozilla#62 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR modifies the UI of the wasm test page to demonstrate translation b/w non-English language pairs.
I'm telling you this is not the intended way to do pivot translation, demonstrating this as a reference has more downsides than upsides. I'm questioning what value these 1500 lines of code through multiple hula-hoops adds to this repository? Who's the consumer for the plaintext pivoting feature in JavaScript?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm telling you this is not the intended way to do pivot translation
Once you have the pivot translation ready the intended way, I will change the wasm test page accordingly.
On a side note, are you also saying that #210 doesn't handle the outbound translation use case the "intended way"? cc @andrenatal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you also saying that #210 doesn't handle the outbound translation use case the "intended way"?
No. There's an inefficient implementation which requires an invasive marian-dev fix, being investigated/tracked in browsermt/marian-dev#57. I'm kind of confused at how this query came in this context, and would be happy to clear if you're mixing up anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intended way is you to call a new function mentioned in #212 (comment), where the issue is being tracked. You tell the C++ API, "Hey, I have these models which you can apply in series to translate.". This will give you a Response
that is consistent, usable the same way as you would have in an English model, hence working with tags and what not, when the feature gets to you.
This will shift the responsibility of resolving inconsistencies at alignment matrices, vocabulary mismatches, sentence misalignments to the C++ library. You will run into trouble doing this at JS at some point, and by then if this example code is used the efforts built on top of this example would go down the drain.
The feature is not implemented, hence still open. If I may ask, why are you under the impression that this has been implemented? I didn't communicate pivot translation is solved, but I believe it's my responsibility to point out pitfalls of the current approach when it's brought to my attention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jerinphilip , and that's the behavior we expect (applying the model in series)! But, since it's not implemented, what's the way we should proceed in order to make progress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intended way is you to call a new function mentioned in #212 (comment), where the issue is being tracked.
The feature is not implemented, hence still open.
@kpu Could you please confirm that until #212 resolves, it is okay to perform translation b/w language pairs not involving English by using English as a pivot? I changed the wasm test page just to demonstrate that multiple models work in JS and used pivoting as a use case to demonstrate this. We are aware that the existing solution for pivoting works only for the plain text and not for the html text. We would be happy to change things on our side once #212 resolves. Please let me know if I missed something here. Thanks. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have systems lying around for Kazakh-English with Russian as a pivot (which have not been optimized for Bergamot).
There will be API changes in the future so we can get word alignment working in pivot translation. It will probably be done on C++ side. As long as you accept that your code will have to change in the future to accommodate that (and Edinburgh is unlikely to be helpful with this since we lack JavaScript capacity), I have no problem with you putting out something that works for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Thanks for the explanation.
We need to show to possible future maintainers how to consume the wasm module given that developer documentation coverage is almost non-existent. This is a problem we identified late in the extension and although in the new version we are going to have full doc coverage, we don't want this to happen to the rest of the repositories. And github.com/mozilla/translate shouldn't be used as a reference for that purpose. The intent of that repo is just to work as proof of concept of the technology on scenarios beyond the extension ones (in-page and outbound translation) |
Be nice. |
Merging this one as @andrenatal already approved it. |
This PR modifies the UI of the wasm test page to demonstrate translation b/w non-English language pairs. This is an improvement on the #224.