float8nocompile: add benchmark script #1454

danielvegamyhre · 2024-12-20T19:17:35Z

Summary

Add benchmark script (based on this script and modified for my purposes) to compare performance of a single forward+backward pass for:

production float8 eager training
compiled float8 training
float8nocompile prototype training

Example output:

  input_size  high_precision_dtype      eager_time    compiled_time    float8nocompile
-------------  ----------------------  ------------  ---------------  -----------------
 65500         torch.float32                599.299          298.101    94446
 65500         torch.bfloat16               649.674          394.535    94386.3
     1.05e+06  torch.float32                640.5            332.171   104449
     1.05e+06  torch.bfloat16               685.822          421.365   104372
     1.68e+07  torch.float32               1963.09          1214.32    280825
     1.68e+07  torch.bfloat16              1828.16          1051.67    261710
     2.68e+08  torch.float32              24129.8          16287.2          3.39791e+06
     2.68e+08  torch.bfloat16             21603.2          12389.9          3.39515e+06

As you can see performance of initial float8nocompile prototype is insanely slow somehow, so I will need to some profiling/debugging to figure out why and improve the performance.

pytorch-bot · 2024-12-20T19:17:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1454

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e74d63a with merge base 29de3e0 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2024-12-20T19:25:57Z

landing this is fine, I'd recommend also adding a knob for your logic here (https://github.com/pytorch/ao/blob/main/benchmarks/float8/bench_linear_float8.py) to reuse all of the other benchmarks we have, and easily compare with the prod path. It's fine to add prototype benchmarking to that folder as it's not production facing.

danielvegamyhre · 2024-12-20T20:34:18Z

landing this is fine, I'd recommend also adding a knob for your logic here (https://github.com/pytorch/ao/blob/main/benchmarks/float8/bench_linear_float8.py) to reuse all of the other benchmarks we have, and easily compare with the prod path. It's fine to add prototype benchmarking to that folder as it's not production facing.

Sounds good, will do

danielvegamyhre added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Dec 20, 2024

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 20, 2024

danielvegamyhre requested review from drisspg and vkuzo December 20, 2024 19:17

float8nocompile: add benchmark script

e74d63a

danielvegamyhre force-pushed the benchmark branch from 4c840ad to e74d63a Compare December 20, 2024 19:24

vkuzo approved these changes Dec 20, 2024

View reviewed changes

danielvegamyhre merged commit 3bac905 into pytorch:main Dec 20, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

float8nocompile: add benchmark script #1454

float8nocompile: add benchmark script #1454

danielvegamyhre commented Dec 20, 2024

pytorch-bot bot commented Dec 20, 2024 •

edited

Loading

vkuzo commented Dec 20, 2024

danielvegamyhre commented Dec 20, 2024

float8nocompile: add benchmark script #1454

float8nocompile: add benchmark script #1454

Conversation

danielvegamyhre commented Dec 20, 2024

pytorch-bot bot commented Dec 20, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1454

✅ No Failures

vkuzo commented Dec 20, 2024

danielvegamyhre commented Dec 20, 2024

pytorch-bot bot commented Dec 20, 2024 •

edited

Loading