Skip to content

Training tokens of Stage1 and Stage 2 #16

Closed Answered by mayank31398
kjhoon7686 asked this question in Q&A
Discussion options

You must be logged in to vote

phase 1 is 10T, phase 2 is 2T for dense models
phase 1 is 8T, phase 2 is 2T for MoE models
you can find the details in our paper: https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@kjhoon7686
Comment options

Answer selected by mayank31398
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants