Training tokens of Stage1 and Stage 2 #16

kjhoon7686 · 2024-10-31T04:56:47Z

kjhoon7686
Oct 31, 2024

Could you let me know the number of training tokens for Stage 1 and Stage 2 respectively?

phase 1 is 10T, phase 2 is 2T for dense models
phase 1 is 8T, phase 2 is 2T for MoE models
you can find the details in our paper: https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf

mayank31398 · 2024-10-31T05:41:51Z

phase 1 is 10T, phase 2 is 2T for dense models
phase 1 is 8T, phase 2 is 2T for MoE models
you can find the details in our paper: https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf

1 reply

Thank you for your answer. I missed that detail earlier. I was deeply impressed by your paper!