-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove usage of torch.autograd.profiler_legacy
for benchmarks
#2149
base: main
Are you sure you want to change the base?
Conversation
4b8f9d2
to
e6e3bbc
Compare
UPD: Fixed, see the last run: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10760495839/job/29838516905 |
d9794bf
to
ef78aa4
Compare
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
ef78aa4
to
b7ddbc9
Compare
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any performance impact?
Let's take results (TFLOPS) from https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10760495839 and https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10763825651. Note: unfortunately the script from #2060 doesn't work for this PR Summary (old version vs new version):
|
Signed-off-by: Anatoly Myachev <[email protected]>
This reverts commit c5aefdc.
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
We can try another method. It looks like the previous measurement method only included the sycl kernel time, and the new approach (elapsed_time) also includes everything that was done to prepare this kernel, for example, a large number of allocations in the case of "Attn" benchmark. If we remove them from the measurement, the average difference becomes only 16%!!! instead of 93% (for "Attn"). I believe that by working in this direction it is possible to achieve an acceptable deterioration in absolute numbers of performance while maintaining the ratio (within some acceptable limits). This could be an acceptable solution until bugs in the other method are fixed. This also unlocks the ability for us to benchmark on platforms that only work with Upstream PyTorch. New CI run on this PR with last changes (with moving allocations from the measured function): https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10783177036/job/29904547469. @etiotto @whitneywhtsang @alexbaden thoughts? |
Performance with the current approach remains [unchanged](https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10794302056/job/29938275434), but greatly improves the numbers in a situation where `elapsed_time` method is used. Part of #2149 Closes #2198 Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Part of #1905
Closes #2150
Since #1905 is blocked for now it's probably good idea to change profiler to xpu elapsed_time in order to continue investigate performance degradation caused by removing IPEX import. I think we've come to terms with the increase in absolute performance numbers at this point (see #1905 (comment)).