Training tokens of Stage1 and Stage 2 #16
-
Could you let me know the number of training tokens for Stage 1 and Stage 2 respectively? |
Beta Was this translation helpful? Give feedback.
Answered by
mayank31398
Oct 31, 2024
Replies: 1 comment 1 reply
-
phase 1 is 10T, phase 2 is 2T for dense models |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
mayank31398
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
phase 1 is 10T, phase 2 is 2T for dense models
phase 1 is 8T, phase 2 is 2T for MoE models
you can find the details in our paper: https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf