WarpSpec] improve allocation for smem #7

manman-ren · 2024-12-09T19:56:50Z

Summary: Attempt to teach Allocation analysis to be aware of warpspec regions. Add a list of regions to each buffer, also teach interference graph to be ware of regions. Currently it makes convert_layout within one consumer to be able to overlap.

Test Plan: Run JFA bwd

Summary: Attempt to teach Allocation analysis to be aware of warpspec regions. Add a list of regions to each buffer, also teach interference graph to be ware of regions. Currently it makes convert_layout within one consumer to be able to overlap and in the non-persistent case, convert_layout can share with private global buffer. For persistent, we need to make sure producer doesn't reload the private global buffer for the outer loop (i.e persistent loop) before convert_layout happens in the consumer. Test Plan: Run JFA bwd Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

htyu · 2024-12-10T06:04:38Z

lib/Analysis/Allocation.cpp

+            return true;
+        }
+      }
+      return false;


Can one buffer have a region id while the other doesn't, and should that be treated in different regions?

Yeah we can be conservative. I am currently trying to handle the private buffer associated with channels, the checking for "!= 0" i.e ignoring producer warp group is kind of hacky.

htyu · 2024-12-10T06:56:19Z

lib/Analysis/Allocation.cpp

+                                                      : maxId;
+            }
+            if (operationId[liveOp] < minId) {
+              minId = operationId[liveOp];


I guess isLocalForWS falls through here. How does it make sure the scratch buffer (convert layout) of the second consumer overlaps with that of the first consumer? Is it handled in buildInterferenceGraph?

Yeah in buildInterferenceGraph, ops with different taskIds will be interfering with each other.

htyu · 2024-12-10T16:40:23Z

lib/Analysis/Allocation.cpp

+            }
+            if (isPrivateGlobalForWS) {
+              minId = 0;
+              maxId = operationId[liveOp] + 1 > maxId ? operationId[liveOp] + 1


For buffers used inside a loop, should we use the operationId of the loop here which should cover the whole body of the loop? Perhaps the outermost loop body should be used which should handle the persistent loop case.

manman-ren · 2024-12-11T21:33:44Z

lib/Analysis/Allocation.cpp

@@ -548,6 +630,9 @@ class AllocationAnalysis {
            xSizeRange.intersects(ySizeRange)) {
          interference[x].insert(y);
        }
+        // if x and y belong to different regions (ignore producer region).
+        if (inDifferentRegion(x, y) && xSizeRange.intersects(yOpRange))


Oh there is a typo here: xSizeRange.intersects(yOpRange)
-->
xSizeRange.intersects(ySizeRange)

While I was looking at that code, I wonder if we should always make x and y intersect, if they are from different regions.

manman-ren · 2024-12-12T00:01:09Z

lib/Analysis/Allocation.cpp

+      auto tA = A->regionIds;
+      auto tB = B->regionIds;
+      for (auto t1 : tA) {
+        for (auto t2 : tA) {


Another typo here, should be tB. Will try to clean this up.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

htyu · 2024-12-13T18:00:34Z

It seems that the patch increases SMEM usage for one of the GEMM kernels. I'll take a deeper look.

manman-ren · 2024-12-13T18:12:04Z

It seems that the patch increases SMEM usage for one of the GEMM kernels. I'll take a deeper look.

Oh that is weird.

Summary: Attempt to teach Allocation analysis to be aware of warpspec regions. Add a list of regions to each buffer, also teach interference graph to be ware of regions. Currently it makes convert_layout within one consumer to be able to overlap. Test Plan: Run JFA bwd

htyu · 2024-12-13T23:43:13Z

It seems that the patch increases SMEM usage for one of the GEMM kernels. I'll take a deeper look.

Oh that is weird.

Fixed by #8

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 9, 2024

fix

50c6be3

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

htyu reviewed Dec 10, 2024

View reviewed changes

manman-ren commented Dec 11, 2024

View reviewed changes

manman-ren commented Dec 12, 2024

View reviewed changes

manman-ren added 2 commits December 11, 2024 16:24

fix typos etc

74c714f

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

be conservative

14e46c8

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

htyu approved these changes Dec 12, 2024

View reviewed changes

manman-ren merged commit 05ee274 into ws Dec 13, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WarpSpec] improve allocation for smem #7

WarpSpec] improve allocation for smem #7

manman-ren commented Dec 9, 2024 •

edited

Loading

htyu Dec 10, 2024

manman-ren Dec 11, 2024

htyu Dec 10, 2024

manman-ren Dec 11, 2024

htyu Dec 10, 2024

manman-ren Dec 11, 2024

htyu Dec 11, 2024

manman-ren Dec 12, 2024

htyu commented Dec 13, 2024

manman-ren commented Dec 13, 2024

htyu commented Dec 13, 2024

WarpSpec] improve allocation for smem #7

WarpSpec] improve allocation for smem #7

Conversation

manman-ren commented Dec 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

htyu commented Dec 13, 2024

manman-ren commented Dec 13, 2024

htyu commented Dec 13, 2024

manman-ren commented Dec 9, 2024 •

edited

Loading