Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasm test page UI for translating b/w non-English language pairs #231

Merged
merged 2 commits into from
Oct 19, 2021
Merged

Wasm test page UI for translating b/w non-English language pairs #231

merged 2 commits into from
Oct 19, 2021

Conversation

abhi-agg
Copy link
Contributor

This PR modifies the UI of the wasm test page to demonstrate translation b/w non-English language pairs. This is an improvement on the #224.

Copy link
Contributor

@jerinphilip jerinphilip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is https://github.com/mozilla/translate being duplicated here? What purpose is this serving anyone? Some tiny snippets on how to use the WebAssembly bindings similar to what previously existed I understand to be useful for developers (non-mozilla perhaps).

I'm failing to comprehend why the entire model delivery mechanism and what looks like extension level complexity js (not some reference example) is being brought in.

A package_lock.json? CSS? Why?

}
}

const translateInvolvingEnglish = (from, to, paragraphs) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're parameterizing this by (from, to). This has nothing to do with English.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you are trying to say here. One of the from or to parameter is English for this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing here, but there seems to be undocumented assumptions that any non-English language has to go through English pivoting to obtain translations. Your code should generalize to any pivot (or any sequence of models applied in series to get to the target language although it won't be more than 1 hop). You encode pivoting through English as a specialization of the above general case.

One of the from or to parameter is English for this function

Where is this documented? Or better, where is such a constraint imposed by design? I can throw a hi -> ta model at this function, and it'll work provided the model is in the inventory. It is general, which is not a bad thing.

Naming is how you communicate to devs if this is a "demonstration" example. While naming is subjective, translateInvolvingEnglish is definitely wrong as I can throw non-English directions at this. You're adding a confusing abstraction, if the intended target crowd is supposed to be reading. You can indicate this is a private method by means of hiding it from outside within a closure (translate), which is probably not the best thing to do.

Another possibility to swap the names of translateInvolvingEnglish (maybe call this translatePivotingThroughEnglish) and translate to get the general purpose case, where not through English direct models will also happen to be supported if available in the inventory. There is absolutely no reason to go through English if there's a faster model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code should generalize to any pivot (or any sequence of models applied in series to get to the target language although it won't be more than 1 hop).

It doesn't make sense to me at this point to change the wasm test page to accommodate a general case which currently doesn't exist. I can generalize it once we have trained models that pivot without involving English.

if (from !== 'en' && to !== 'en') {
log(`Translating '${from}${to}' via pivoting: '${from}en' -> 'en${to}'`);
let translatedParagraphsInEnglish = translateInvolvingEnglish(from, 'en', paragraphs);
return translateInvolvingEnglish('en', to, translatedParagraphsInEnglish);
Copy link
Contributor

@jerinphilip jerinphilip Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works only for plaintext, is not the recommended or intended way to do pivot translation. I'm unsure who communicated this is the solution for mozilla#62 (comment). Going forward with this for HTML will incur serious debt ahead - it won't work with links/tags. But a targeted by the library pivot translation will be compatible with other algorithms like those which require TagTree.

#212 is unsolved and will take more development time after which the proper solution will come about, if at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know that for the links/tags, the current setup wouldn't work. Once the API supports pivoting for html text, this page will be updated accordingly. This is quite clear from the mozilla#62 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR modifies the UI of the wasm test page to demonstrate translation b/w non-English language pairs.

I'm telling you this is not the intended way to do pivot translation, demonstrating this as a reference has more downsides than upsides. I'm questioning what value these 1500 lines of code through multiple hula-hoops adds to this repository? Who's the consumer for the plaintext pivoting feature in JavaScript?

Copy link
Contributor Author

@abhi-agg abhi-agg Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm telling you this is not the intended way to do pivot translation

Once you have the pivot translation ready the intended way, I will change the wasm test page accordingly.

On a side note, are you also saying that #210 doesn't handle the outbound translation use case the "intended way"? cc @andrenatal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you also saying that #210 doesn't handle the outbound translation use case the "intended way"?

No. There's an inefficient implementation which requires an invasive marian-dev fix, being investigated/tracked in browsermt/marian-dev#57. I'm kind of confused at how this query came in this context, and would be happy to clear if you're mixing up anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intended way is you to call a new function mentioned in #212 (comment), where the issue is being tracked. You tell the C++ API, "Hey, I have these models which you can apply in series to translate.". This will give you a Response that is consistent, usable the same way as you would have in an English model, hence working with tags and what not, when the feature gets to you.

This will shift the responsibility of resolving inconsistencies at alignment matrices, vocabulary mismatches, sentence misalignments to the C++ library. You will run into trouble doing this at JS at some point, and by then if this example code is used the efforts built on top of this example would go down the drain.

The feature is not implemented, hence still open. If I may ask, why are you under the impression that this has been implemented? I didn't communicate pivot translation is solved, but I believe it's my responsibility to point out pitfalls of the current approach when it's brought to my attention.

Copy link
Contributor

@andrenatal andrenatal Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jerinphilip , and that's the behavior we expect (applying the model in series)! But, since it's not implemented, what's the way we should proceed in order to make progress?

Copy link
Contributor Author

@abhi-agg abhi-agg Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intended way is you to call a new function mentioned in #212 (comment), where the issue is being tracked.
The feature is not implemented, hence still open.

@kpu Could you please confirm that until #212 resolves, it is okay to perform translation b/w language pairs not involving English by using English as a pivot? I changed the wasm test page just to demonstrate that multiple models work in JS and used pivoting as a use case to demonstrate this. We are aware that the existing solution for pivoting works only for the plain text and not for the html text. We would be happy to change things on our side once #212 resolves. Please let me know if I missed something here. Thanks. 👍

Copy link
Member

@kpu kpu Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have systems lying around for Kazakh-English with Russian as a pivot (which have not been optimized for Bergamot).

There will be API changes in the future so we can get word alignment working in pivot translation. It will probably be done on C++ side. As long as you accept that your code will have to change in the future to accommodate that (and Edinburgh is unlikely to be helpful with this since we lack JavaScript capacity), I have no problem with you putting out something that works for the time being.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thanks for the explanation.

andrenatal
andrenatal previously approved these changes Oct 18, 2021
@andrenatal
Copy link
Contributor

We need to show to possible future maintainers how to consume the wasm module given that developer documentation coverage is almost non-existent. This is a problem we identified late in the extension and although in the new version we are going to have full doc coverage, we don't want this to happen to the rest of the repositories.

And github.com/mozilla/translate shouldn't be used as a reference for that purpose. The intent of that repo is just to work as proof of concept of the technology on scenarios beyond the extension ones (in-page and outbound translation)

@kpu
Copy link
Member

kpu commented Oct 18, 2021

Be nice.

@browsermt browsermt deleted a comment from jerinphilip Oct 19, 2021
@abhi-agg
Copy link
Contributor Author

Merging this one as @andrenatal already approved it.

@abhi-agg abhi-agg merged commit a0cb1e4 into browsermt:main Oct 19, 2021
@abhi-agg abhi-agg deleted the wasm-test-page-multiple-models branch October 20, 2021 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants