-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSpeedCheckpoint: support custom final ln idx #5506
Conversation
97dec42
to
68984ee
Compare
till today only last layer (idx=-1) was considered using FINAL_LAYER_NORM_INDEX which is set to -1. this commit allow the user to pass custom value for model where this default value does not apply.
68984ee
to
c5e5ade
Compare
@loadams can you please re-run "nv-torch-latest-v100" validation ? i think it failed on set-up issue |
Yes, re-running now and will work on getting this merged. |
Sorry disturb you here, Could you explain why FINAL_LAYER_NORM_INDEX set to -2 not -1 for LLaMA? @nelyahu Thanks. |
@jinyouzhi The previous code assumed that model is built of embedding + transformer layers + projection layer Also this approach of fetching layers based on their indices is not a good practice, and need to invest more efforts in doing it based on the layer type so it will generic and robust. |
@nelyahu got it. Thank you very much! |
till today only last layer (idx=-1) was considered using FINAL_LAYER_NORM_INDEX which is set to -1. this PR allows the user to pass custom value for model where this default value does not apply. see example for usage in HabanaAI/Megatron-DeepSpeed fork repository: https://github.com/HabanaAI/Megatron-DeepSpeed/blob/c9feb8cacabc6dd4da4266cff08db555a21122e2/tools/verify_checkpoint_non_tp_consistency.py#L296 --------- Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]>
till today only last layer (idx=-1) was considered using FINAL_LAYER_NORM_INDEX which is set to -1.
this PR allows the user to pass custom value for model where this default value does not apply.
see example for usage in HabanaAI/Megatron-DeepSpeed fork repository:
https://github.com/HabanaAI/Megatron-DeepSpeed/blob/c9feb8cacabc6dd4da4266cff08db555a21122e2/tools/verify_checkpoint_non_tp_consistency.py#L296