Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix how stable_train_samples is calculated. This is a ROCm/transformers specific change to add warmup before collecting perf numbers, but it is currently not working as expected. Specifically:
E.g. if batch_size is 10, total_steps is 150, first 10 steps take 2 seconds, next 140 steps take 1 second, then:
Instead, I added a stable_train_warmup_steps argument (default=10) to perform as intended.
With this change, pyt_huggingface_gpt2 perf changes from 559.092 to 529.131 stable_train_samples_per_second
NOTE: This will affect HF perf for QA, execdb, etc.