You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been reading your paper "Context-Aware Monolingual Repair for Neural Machine Translation" and checking some of your code for clarification but I still have some doubts regarding the generation of the round-trip translations.
As stated in the paper,
Russian monolingual data is first translated into English, using the Russian→English model and beam search with beam size of 4.
Contrasting it with your code it is not clear to me whether you mean beam size or n-best, as there is an iteration over all the beam hypotheses. Do you keep the best 4 hypotheses?
we use the English→Russian model to sample translations with temperature of 0.5. For each sentence, we precompute 20 sampled translations and randomly choose one of them when forming a training minibatch for DocRepair.
If I'm not mistaken, what I understood looking at your code is that for each of the 4 hypotheses in the previous step you precompute 20 sampled translations. Resulting in 80 possible translations for each original Russian sentence. Is this correct? During the training process in each data iteration I guess you select 4 random translations (one for each n-best hypotheses). Is this right? In addition, you mention random sampling which I guess it is over all the vocabulary, isn't it?
Finally,
Also, in training, we replace each token in the input with a random one with the probability of 10%
In this case, replacement candidates are chosen from the whole vocabulary set, aren't they?
Thanks
The text was updated successfully, but these errors were encountered:
Hi @lena-voita ,
I've been reading your paper "Context-Aware Monolingual Repair for Neural Machine Translation" and checking some of your code for clarification but I still have some doubts regarding the generation of the round-trip translations.
As stated in the paper,
Contrasting it with your code it is not clear to me whether you mean beam size or n-best, as there is an iteration over all the beam hypotheses. Do you keep the best 4 hypotheses?
good-translation-wrong-in-context/lib/task/seq2seq/models/DocRepair.py
Line 451 in bb59382
Then, the paper states,
If I'm not mistaken, what I understood looking at your code is that for each of the 4 hypotheses in the previous step you precompute 20 sampled translations. Resulting in 80 possible translations for each original Russian sentence. Is this correct? During the training process in each data iteration I guess you select 4 random translations (one for each n-best hypotheses). Is this right? In addition, you mention random sampling which I guess it is over all the vocabulary, isn't it?
Finally,
In this case, replacement candidates are chosen from the whole vocabulary set, aren't they?
Thanks
The text was updated successfully, but these errors were encountered: