Move output tensor allocation out of benchmark function for GEMM #2328

anmyachev · 2024-09-24T18:04:32Z

Geomean was incorrectly calculated for GEMM (in #2298), after recalculating I saw a deterioration in the ratio, which may be due to allocations.

Let's see: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11019414473

The Triton GEMM adv geomean increased by ~1TFlops.

Signed-off-by: Anatoly Myachev <[email protected]>

whitneywhtsang · 2024-09-24T18:49:53Z

benchmarks/triton_kernels_benchmark/gemm_benchmark.py

@@ -256,7 +252,13 @@ def benchmark(B, M, N, K, provider):
        _, min_ms, max_ms, mean_ms, cv = benchmark_suit.do_bench(lambda: torch.matmul(a, b), warmup=10, rep=10,
                                                                 quantiles=quantiles, fast_flush=False)
    elif provider == 'triton':
-        triton_fn = lambda: matmul(a, b)
+        if len(a.shape) == 3 and len(b.shape) == 3:


how about assert len(a.shape) == len(b.shape)?

how about assert len(a.shape) == len(b.shape)?

@whitneywhtsang could you elaborate a bit?

how about assert len(a.shape) == len(b.shape)?

Should I add this assert before if?

something like #2328 (comment)

benchmarks/triton_kernels_benchmark/gemm_benchmark.py

Co-authored-by: Whitney Tsang <[email protected]>

Signed-off-by: Anatoly Myachev <[email protected]>

Move output tensor allocation out of benchmark function for GEMM

8c2f80e

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev requested a review from whitneywhtsang September 24, 2024 18:11

anmyachev marked this pull request as ready for review September 24, 2024 18:12

whitneywhtsang reviewed Sep 24, 2024

View reviewed changes

benchmarks/triton_kernels_benchmark/gemm_benchmark.py Outdated Show resolved Hide resolved

anmyachev and others added 2 commits September 24, 2024 21:02

Update benchmarks/triton_kernels_benchmark/gemm_benchmark.py

5697d4d

Co-authored-by: Whitney Tsang <[email protected]>

address review comments

5ffd569

Signed-off-by: Anatoly Myachev <[email protected]>

whitneywhtsang approved these changes Sep 24, 2024

View reviewed changes

anmyachev enabled auto-merge (squash) September 24, 2024 19:38

anmyachev merged commit 768e5bb into main Sep 24, 2024
5 checks passed

anmyachev deleted the amyachev/gemm-allocations branch September 24, 2024 19:39

anmyachev restored the amyachev/gemm-allocations branch September 24, 2024 21:25

anmyachev deleted the amyachev/gemm-allocations branch October 18, 2024 09:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move output tensor allocation out of benchmark function for GEMM #2328

Move output tensor allocation out of benchmark function for GEMM #2328

anmyachev commented Sep 24, 2024 •

edited

Loading

whitneywhtsang Sep 24, 2024

anmyachev Sep 24, 2024 •

edited

Loading

whitneywhtsang Sep 24, 2024

Move output tensor allocation out of benchmark function for GEMM #2328

Move output tensor allocation out of benchmark function for GEMM #2328

Conversation

anmyachev commented Sep 24, 2024 • edited Loading

whitneywhtsang Sep 24, 2024

Choose a reason for hiding this comment

anmyachev Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

whitneywhtsang Sep 24, 2024

Choose a reason for hiding this comment

anmyachev commented Sep 24, 2024 •

edited

Loading

anmyachev Sep 24, 2024 •

edited

Loading