Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended values for modifiers #48

Open
eu9ene opened this issue Feb 6, 2024 · 1 comment
Open

Recommended values for modifiers #48

eu9ene opened this issue Feb 6, 2024 · 1 comment

Comments

@eu9ene
Copy link
Contributor

eu9ene commented Feb 6, 2024

It's not clear from the examples in the Readme and from the paper what would be a good first choice of the modifiers' probabilities to start with. I understand that it likely depends a lot on language and data. However developing the intuition for setting those probabilities and other settings will take a lot of experimentation. It would help if the paper disclosed the full OpusTrainer config for the French-English case study to provide a good starting point and increase reproducibility (there is some config listed in the paper but it's not clear whether it's a real training config or just an example).

For context we're trying to reproduce the results from the paper by adding the same methods to our training pipeline to increase robustness of our models. We've successfully integrated UpperCase, TitleCase and SentencePiece sampling so far.

@jelmervdl
Copy link
Contributor

CC @XapaJIaMnu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants