-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model miniaturization #10
Comments
Thanks for your interest. Unfortuantely, we did not try architectures other than transformer-base. |
Thanks for your reply. The main problems of the model I trained are the decline of translation fluency, multimodal problems, miss and over translations. Do you have any experience with this? |
@JunchengYao The problems you mentioned are very common in NAT models. They are caused by the nature of parallel prediction nature and the conditional independent assumption. Many recent studies (including our DAT) are working hard to alleviate the problems, however, no mature solution exists especially if you use a small model. |
Hi, I tried to train a miniaturized model with 6-layer encoder 3-layer decoder and 256 hidden dims, but found that the accuracy of the model declines rapidly. Is there any suggestion for model miniaturization? Thanks.
The text was updated successfully, but these errors were encountered: