-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate GPT-2 to new tracer #875
Conversation
for i in range(1, gpt2.config.n_layer // decoders_per_rank): | ||
annotate_split_points( | ||
gpt2, | ||
{f'transformer.h.{i * decoders_per_rank}': PipeSplitWrapper.SplitPoint.BEGINNING}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the split point is right before/after this submodule (specified by FQN)? I guess it's also not super clear to me what's the meaning of SplitPoint.beginning. Also curious - tracing will "flatten" the submodule into a bunch of aten ops, so there is no longer a concept of submodule. So does this API finds the first or last node of the submodule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BEGINNING
means right before; END
would mean right after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotation occurs before tracing. So the module structure is still there at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come there is no corresponding END
?
3960481
to
acafdfd
Compare
# Input configs | ||
example_inputs = generate_inputs_for_model( | ||
model_class, gpt2, model_name, args.batch_size, args.device) | ||
input_ids = example_inputs["input_ids"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is input_ids in this case just a single microbatch? Or is it the entire minibatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entire.
When PipelineStage actually runs, it splits the batch internally before feeding to scheduler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few questions.
Description
Migrated GPT-2 example to work with new tracer based pippy.
examples/hf/hf_utils.py contains utility to generate inputs for HuggingFace models.
Model architecture:
Run
Output
https://gist.github.com/kwen2501/b9ed6158d8d0dc90b16824aa6abd8d72