Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

float8nocompile: add benchmark script #1454

Merged
merged 1 commit into from
Dec 20, 2024

Conversation

danielvegamyhre
Copy link
Contributor

Summary

Add benchmark script (based on this script and modified for my purposes) to compare performance of a single forward+backward pass for:

  • production float8 eager training
  • compiled float8 training
  • float8nocompile prototype training

Example output:

  input_size  high_precision_dtype      eager_time    compiled_time    float8nocompile
-------------  ----------------------  ------------  ---------------  -----------------
 65500         torch.float32                599.299          298.101    94446
 65500         torch.bfloat16               649.674          394.535    94386.3
     1.05e+06  torch.float32                640.5            332.171   104449
     1.05e+06  torch.bfloat16               685.822          421.365   104372
     1.68e+07  torch.float32               1963.09          1214.32    280825
     1.68e+07  torch.bfloat16              1828.16          1051.67    261710
     2.68e+08  torch.float32              24129.8          16287.2          3.39791e+06
     2.68e+08  torch.bfloat16             21603.2          12389.9          3.39515e+06

As you can see performance of initial float8nocompile prototype is insanely slow somehow, so I will need to some profiling/debugging to figure out why and improve the performance.

@danielvegamyhre danielvegamyhre added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Dec 20, 2024
Copy link

pytorch-bot bot commented Dec 20, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1454

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e74d63a with merge base 29de3e0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 20, 2024
@vkuzo
Copy link
Contributor

vkuzo commented Dec 20, 2024

landing this is fine, I'd recommend also adding a knob for your logic here (https://github.com/pytorch/ao/blob/main/benchmarks/float8/bench_linear_float8.py) to reuse all of the other benchmarks we have, and easily compare with the prod path. It's fine to add prototype benchmarking to that folder as it's not production facing.

@danielvegamyhre danielvegamyhre merged commit 3bac905 into pytorch:main Dec 20, 2024
18 checks passed
@danielvegamyhre
Copy link
Contributor Author

landing this is fine, I'd recommend also adding a knob for your logic here (https://github.com/pytorch/ao/blob/main/benchmarks/float8/bench_linear_float8.py) to reuse all of the other benchmarks we have, and easily compare with the prod path. It's fine to add prototype benchmarking to that folder as it's not production facing.

Sounds good, will do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing Use this tag if you don't want this PR to show up in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants