Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model miniaturization #10

Open
JunchengYao opened this issue Jan 3, 2023 · 3 comments
Open

model miniaturization #10

JunchengYao opened this issue Jan 3, 2023 · 3 comments

Comments

@JunchengYao
Copy link

Hi, I tried to train a miniaturized model with 6-layer encoder 3-layer decoder and 256 hidden dims, but found that the accuracy of the model declines rapidly. Is there any suggestion for model miniaturization? Thanks.

@hzhwcmhf
Copy link
Member

hzhwcmhf commented Jan 3, 2023

Thanks for your interest. Unfortuantely, we did not try architectures other than transformer-base.
In my intuition, both the encoder and decoder are important in capturing the data information. Especially, a large decoder would help glancing training, which is critical for the final performance. I think that using knowledge distillation may be helpful to reduce the model size.
Please feel free to discuss here and it would be very grateful if you could share your findings.

@JunchengYao
Copy link
Author

Thanks for your reply. The main problems of the model I trained are the decline of translation fluency, multimodal problems, miss and over translations. Do you have any experience with this?

@hzhwcmhf
Copy link
Member

hzhwcmhf commented Jan 8, 2023

@JunchengYao The problems you mentioned are very common in NAT models. They are caused by the nature of parallel prediction nature and the conditional independent assumption. Many recent studies (including our DAT) are working hard to alleviate the problems, however, no mature solution exists especially if you use a small model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants