Change default benchmark mode to upstream PyTorch #2298

anmyachev · 2024-09-19T22:32:57Z

Current state (https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10950264922 vs https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10949253321):

Softmax: triton geomean diff: 13%, xetla geomean diff: 8%, ratio geomean diff: 5%
- The performance deteriorates most on small data. For example, for N=256, the average value of milliseconds, from which teraflops are calculated, changes as follows: 0.0055 vs 0.0116. The difference is about 5-6 microseconds. This is approximately the time spent on the host to run the kernel and cannot be avoided. An option may be to increase the data volume, which will reduce the impact of associated time losses.
FA advanced: ~~triton geomean diff: 2%, xetla geomean diff: 3%, ratio geomean diff: 2%.~~ Correct numbers are: triton geomean diff: -2.3%, xetla geomean diff: -4%, ratio (triton/xetla) geomean diff: 1.7%
- This change shouldn't be a problem, part of it is definitely fluctuations in measurements.
- To check the correctness it seems we can use the cpu version: Use cpu version of torch sdpa until xpu version is fixed #2300
GEMM advanced: ~~triton geomean diff: 2%, xetla geomean diff: 2%, ratio geomean diff: 4%~~. Correct numbers are: triton geomean diff: -4.1%, xetla geomean diff: -2.7%, ratio (triton/xetla) geomean diff: -1.4%
Float conversion microbenchmark: BF16 geomean diff: << 1%, FP16 geomean diff: << 1%

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev · 2024-09-23T16:26:01Z

Let's move on. This change can be easily rolled back if necessary.

whitneywhtsang · 2024-09-24T19:23:26Z

@anmyachev Should the ratio diff be recalculated for FA as well? FA advanced: triton geomean diff: 2%, xetla geomean diff: 3%, ratio geomean diff: 2%.
Do you know why the degradation of Triton is much higher than XeTLA for Softmax?

anmyachev · 2024-09-24T21:34:22Z

Do you know why the degradation of Triton is much higher than XeTLA for Softmax?

Most likely due to different time spent on the host, in the case of Triton there is more Python code, which is slower than C++.

@whitneywhtsang since GEMM on Triton it became worse than planned, I suppose it is worth rolling back this change? Or is this an acceptable change?

UPD: about overhead in Triton: triton-lang/triton#3166

etiotto · 2024-09-25T13:56:10Z

Do you know why the degradation of Triton is much higher than XeTLA for Softmax?

Most likely due to different time spent on the host, in the case of Triton there is more Python code, which is slower than C++.

@whitneywhtsang since GEMM on Triton it became worse than planned, I suppose it is worth rolling back this change? Or is this an acceptable change?

UPD: about overhead in Triton: triton-lang/triton#3166

We should report the performance number with IPEX on by default because without IPEX the timing taken by upstream PyTorch is not precise (not just the kernel time). IMHO we should revert this change.

This reverts commit 782aecf.

…2342) Address #2298 (comment). This reverts commit 782aecf. CI status: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11034669496

Change default benchmark mode to upstream PyTorch

619bf25

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev marked this pull request as ready for review September 20, 2024 10:46

anmyachev requested review from whitneywhtsang, etiotto, chengjunlu and pbchekin September 20, 2024 10:46

vlad-penkin linked an issue Sep 20, 2024 that may be closed by this pull request

[Benchmarks][Upstream PyTorch 2.5] Triton and XeTLA softmax performance degrades in comparison with torch 2.1 / ipex 2.1 test proxies #2106

Closed

chengjunlu approved these changes Sep 23, 2024

View reviewed changes

anmyachev merged commit 782aecf into main Sep 23, 2024
4 checks passed

anmyachev deleted the amyachev/change-default-bench-mode branch September 23, 2024 16:26

anmyachev mentioned this pull request Sep 24, 2024

Move output tensor allocation out of benchmark function for GEMM #2328

Merged

anmyachev added a commit that referenced this pull request Sep 25, 2024

Revert "Change default benchmark mode to upstream PyTorch (#2298)"

94cd2e3

This reverts commit 782aecf.

anmyachev mentioned this pull request Sep 25, 2024

Revert "Change default benchmark mode to upstream PyTorch (#2298)" #2342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change default benchmark mode to upstream PyTorch #2298

Change default benchmark mode to upstream PyTorch #2298

anmyachev commented Sep 19, 2024 •

edited

Loading

anmyachev commented Sep 23, 2024

whitneywhtsang commented Sep 24, 2024 •

edited

Loading

anmyachev commented Sep 24, 2024 •

edited

Loading

etiotto commented Sep 25, 2024

Change default benchmark mode to upstream PyTorch #2298

Change default benchmark mode to upstream PyTorch #2298

Conversation

anmyachev commented Sep 19, 2024 • edited Loading

anmyachev commented Sep 23, 2024

whitneywhtsang commented Sep 24, 2024 • edited Loading

anmyachev commented Sep 24, 2024 • edited Loading

etiotto commented Sep 25, 2024

anmyachev commented Sep 19, 2024 •

edited

Loading

whitneywhtsang commented Sep 24, 2024 •

edited

Loading

anmyachev commented Sep 24, 2024 •

edited

Loading