compiler: add insert-sync-barrier pass #21

jorendumoulin · 2023-11-20T09:55:25Z

This pull request implements an xDSL pass to insert snax synchronisation barriers in a program. Synchronisation barriers are required when data is shared between multiple cores in a cluster, and the correct control flow of the program must be maintained. It also includes a first custom operation in the SNAX dialect: snax.cluster_sync_op

Consider a function on core 1 c1 and a program on function 2 c2, both using operand a:

c1(a)
c2(a)

As the two programs may have a data dependency on the variable a, c2 can only start after c1 has finished. Therefore, we need a synchronization barrier between these two functions:

c1(a)
sync()
c2(a)

The algorithm used for this pass investigates these dependencies across cores to insert synchronization passes at the correct times. This is done by walking through every op in the IR. For every operand used by the operation, we check if it also used by an operation on another core. If this is the case, we must insert a synchronization barrier between the two.

for op in module.walk():
    for operand in op.operands:
        for use in operand.uses:
            if core(op) != core(use):
                ## data dependency across cores, insert synchronization before next op
                insert_sync_before(use)

A problem with this implementation is that it could lead to multiple superfluous synchronization barriers. Consider the following program:

c1(a)
c1(b)
c2(a)
c2(b)

The previous algorithm would result in the following output:

c1(a)
c1(b)
sync()
c2(a)
sync()
c2(b)

The first synchronization barrier before c2(a) makes the second one before c2(b) unnecessary. A modified version of the algorithm handles this special case as follows:

ops_to_sync = []
for op in module.walk():
    if op in ops_to_sync:
        insert_sync_before(op) 
        ops_to_sync = []

    for operand in op.operands:
        for use in operand.uses:
            if core(op) != core(use):
                ## data dependency across cores, insert synchronization before next op
                ops_to_sync.append(use)

In this adaptation, if it is detected that a synchronization barrier is required, it will add it to the list of operations that need a synchronization barrier ops_to_sync. When that op is visited by module.walk(), only then will the synchronization barrier be inserted. If however, since adding the synchronization barrier to the list, another barrier has already been inserted in the code, the barrier is cleared by resetting the list ops_to_sync = [] , avoiding unnecessary barriers.

This algorithm can be found in the code almost exactly as it is written here, with the following adaptations:

due to a lack of core(op) function, we need another way to determine which piece of code runs on which core. The simple rule used for now is that all memref ops (copy, alloc, ...) are handled by the dma core, and all other ops run on the compute core

:q

JosseVanDelm · 2023-11-21T13:57:38Z

Hi, we discussed offline what this would look like.
I have a few concerns about this, but I think we both agree that running into problems fast now might be our best move.

There are some things that I like:

This is nice way to get a first "raised" representation in MLIR besides C code!
There are some things that I don't like:
We are inserting buffers without looking at producers/consumers. In your example c1(a) and c2(a) only need to be synced if they are altering A in place, but not if they are only reading it, right (would tensor semantics solve this issue? --> Joren: "We are already past tensor semantics at this point probably?) ?.
Anyhow, inserting more barriers than less is always safer I guess.
We are not very explicit about which stuff we are running on which core, but maybe future work will make this more clear.

JosseVanDelm

@jorendumoulin very nice PR.

Right now you have a lot of comments explaining what you are doing, which is great, but I think in this PR we should be extra careful adding some other comments that highlight the current assumptions maybe? Other than that it is very exciting to see where this is going!

tests/filecheck/transforms/insert-sync-barrier.mlir

compiler/dialects/snax.py

JosseVanDelm · 2023-11-21T13:45:47Z

compiler/transforms/insert_sync_barrier.py

+                        type(op) in memref.MemRef.operations,
+                        type(op_use.operation) in memref.MemRef.operations,


could you maybe alias memref.Memref.operations to a variable DMACoreOps and not memref.Memref.operations = ComputeCoreOps or something?
Your comment is quite clear, but I think we can make this more clear in the code itself, and then we can more easily what we mean with each of those later maybe?

compiler/transforms/insert_sync_barrier.py

tests/filecheck/transforms/insert-sync-barrier.mlir

JosseVanDelm · 2023-11-21T14:00:27Z

tests/filecheck/transforms/insert-sync-barrier.mlir

+    "linalg.generic"(%0, %1, %2) <{"indexing_maps" = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], "iterator_types" = [#linalg.iterator_type<parallel>], "operandSegmentSizes" = array<i32: 2, 1>}> ({
+    ^2(%arg3_1 : i32, %arg4_1 : i32, %arg5_1 : i32):
+      %4 = "arith.muli"(%arg3_1, %arg4_1) : (i32, i32) -> i32
+      "linalg.yield"(%4) : (i32) -> ()
+    }) : (memref<64xi32, 1 : i32>, memref<64xi32, 1 : i32>, memref<64xi32, 1 : i32>) -> ()


It doesn't make a lot of sense to me that this would operate on exactly the same data as before, can we put some different data in the linalg please?

jorendumoulin · 2023-11-22T08:43:06Z

The tests are now suddenly failing, but this is unrelated to this commit (see #25)

JosseVanDelm

Very nice! Let's await a fix for #25 first?

tests/filecheck/transforms/insert-sync-barrier.mlir

jorendumoulin changed the title ~~Joren/insert sync barrier~~ compiler: add insert-sync-barrier pass Nov 20, 2023

Base automatically changed from Joren/raise-dma to main November 20, 2023 12:44

jorendumoulin added 17 commits November 20, 2023 13:45

change test

85cc8d0

fix test

98301cc

fix test

8c038dc

fix test

6099c49

rebase main

2898a13

lower memory cast

40983fb

:q

formatting

11ca192

c typo

d216ccd

revert changes to main

01297af

revert changes to gendata

21633ff

fix ordering

47b5207

init snax dialect

abff01d

init insert sync barrier

02ba3b7

register pass

d07d6e5

add insert sync pass

3a3555e

add tests

1c7843a

remove useless check

ac38e46

jorendumoulin force-pushed the Joren/insert-sync-barrier branch from f6420df to ac38e46 Compare November 20, 2023 12:50

jorendumoulin marked this pull request as ready for review November 20, 2023 13:48

jorendumoulin requested a review from JosseVanDelm November 20, 2023 13:48

JosseVanDelm requested changes Nov 21, 2023

View reviewed changes

address comments

ed68e91

add comments

8c29af2

jorendumoulin mentioned this pull request Nov 22, 2023

compiler: add dispatch-regions transformation pass #24

Merged

JosseVanDelm approved these changes Nov 22, 2023

View reviewed changes

tests/filecheck/transforms/insert-sync-barrier.mlir Show resolved Hide resolved

jorendumoulin merged commit 5feba1d into main Nov 22, 2023
3 of 4 checks passed

jorendumoulin deleted the Joren/insert-sync-barrier branch November 22, 2023 10:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler: add insert-sync-barrier pass #21

compiler: add insert-sync-barrier pass #21

jorendumoulin commented Nov 20, 2023 •

edited

Loading

JosseVanDelm commented Nov 21, 2023

JosseVanDelm left a comment

JosseVanDelm Nov 21, 2023

JosseVanDelm Nov 21, 2023

jorendumoulin commented Nov 22, 2023

JosseVanDelm left a comment

		type(op) in memref.MemRef.operations,
		type(op_use.operation) in memref.MemRef.operations,

compiler: add insert-sync-barrier pass #21

compiler: add insert-sync-barrier pass #21

Conversation

jorendumoulin commented Nov 20, 2023 • edited Loading

JosseVanDelm commented Nov 21, 2023

JosseVanDelm left a comment

Choose a reason for hiding this comment

JosseVanDelm Nov 21, 2023

Choose a reason for hiding this comment

JosseVanDelm Nov 21, 2023

Choose a reason for hiding this comment

jorendumoulin commented Nov 22, 2023

JosseVanDelm left a comment

Choose a reason for hiding this comment

jorendumoulin commented Nov 20, 2023 •

edited

Loading