[SYCL][Graph] Add specification for kernel binary update #378

fabiomestre · 2024-07-18T19:53:11Z

No description provided.

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

Two concurrent PRs added a new use of and simultaneously removed this enum. Commit 63c61d8 added a new use, while dc37699 was trying to delete it.

…rties (intel#14441) This PR defines a new user-facing struct `launch_strategy`, and two new `launch` overloads (currently in `syclcompat::experimental`) which accept a `launch_strategy`. ## Extensions & Properties This work builds on top of the [kernel_properties](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_properties.asciidoc) and [enqueue_functions](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc) extensions. The latter defines APIs for passing `launch_properties` as part of a `launch_config` object. These are the `parallel_for` and `nd_launch` overloads used by the new `launch`. See the [note](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc#launch-configuration) in the Launch configuration section which describes how `kernel_properties` must be passed via a `get(properties_tag)` method of a kernel functor. ## Local Memory Note also that in order to properly handle local memory, we **must** construct the `KernelFunctor` object within the `cgh` lambda, passing in a `local_accessor` to the constructor. Then within `KernelFunctor::operator()` (the SYCL 'kernel') we can at last grab the local memory pointer with `local_acc.get_multi_ptr<sycl::access::decorated::no>()`, since CUDA-style device functions expect to receive their dynamic local memory as a `char *`. --------- Signed-off-by: Joe Todd <[email protected]>

…ntel#14820) These are due to a known regression introduced by the PI removal patch, we have a fix but for now it's more expedient to simply disable the tests and unblock the nightly workflow.

intel#14444) Rename related interop structs/funcs with "external" keyword over "interop" to align better with existing structs/funcs and other 3rd party APIs. Remove "handle" keyword from imported external memory/semaphore objects to distinguish between 3rd party external handles and imported external handles. --------- Co-authored-by: Sean Stirling <[email protected]> Co-authored-by: chedy.najjar <[email protected]>

…ssing to improve compilation time (intel#14786)

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

Scheduled drivers uplift Co-authored-by: GitHub Actions <[email protected]>

`detail::memcpy`, even though in a different namespace, can cause ambiguity with libc's `memcpy`, due to argument dependent lookup (ADL). For example, the compiler throws a compilation error due to `memcpy` ambiguity in the following code: ``` #include <sycl/vector.hpp> template <typename T> void foo(T *dst, T *src, size_t count) { memcpy(dst, src, count * sizeof(T)); } using T = sycl::vec<int, 1>; SYCL_EXTERNAL void bar(T *dst, T *src, size_t count) { foo(dst, src, count * sizeof(T)); } ``` Compilation error: ``` memcpy_test.cpp:5:4: error: call to 'memcpy' is ambiguous 5 | memcpy(dst, src, count * sizeof(T)); | ^~~~~~ memcpy_test.cpp:11:4: note: in instantiation of function template specialization 'foo<sycl::vec<int, 1>>' requested here 11 | foo(dst, src, count * sizeof(T)); | ^ /usr/include/string.h:43:14: note: candidate function 43 | extern void *memcpy (void *__restrict __dest, const void *__restrict __src, | ^ llvm/build/bin/../include/sycl/detail/memcpy.hpp:16:13: note: candidate function 16 | inline void memcpy(void *Dst, const void *Src, size_t Size) { | ^ 1 error generated. ``` To fix this error, this PR renames `detail::memcpy` to `detail::memcpy_no_adl`

When using L0 we always see the x.y.zzzzz style version, even on Windows. These tests were incorrectly running on Windows because of this problem. Signed-off-by: Sarnie, Nick <[email protected]>

This patch adds the `-Werror` flag to all SYCL e2e tests to stop the introduction of new warnings. Added `-Wno-error=` to existing tests that have warnings (Or made changes to resolve the warnings).

)

It's passing in the new driver uplift: https://github.com/intel/llvm/actions/runs/10164787306/job/28112343861 Signed-off-by: Sarnie, Nick <[email protected]>

According to release notes the extension was implemented by intel@e7139b0, intel@0229456 and intel@b5d69df --------- Co-authored-by: Greg Lueck <[email protected]>

Run `black` on python files in buildbot and fusion directories. Those files skipped the original formatting effort, so any change to them now would cause a formatting CI job to fail.

…ntel#14779) Signed-off-by: Neil R. Spruit <[email protected]>

…4790) This allows testing of DX12 interop in L0 backend that doesn't support semaphore importing yet. Fix getDX12Adapter to increment index when software adapter is seen.

…ry (intel#14818) By default, address sanitizer will inline call for setting private shadow memory with small size. However, if work group size is too large, the private shadow memory may allocate failed. We need to check if shadow base is null before trying to poison it. --------- Co-authored-by: Yang Zhao <[email protected]>

These are failing intermittently, possibly due to runtime race condition.

Passing on Arc in postcommit https://github.com/intel/llvm/actions/runs/10180580875/job/28159431987 Signed-off-by: Sarnie, Nick <[email protected]>

intel#14868 Signed-off-by: Sarnie, Nick <[email protected]>

Feedback from intel#14866 Signed-off-by: Sarnie, Nick <[email protected]>

When looking for the correct allocation, the upper bound check was inclusive (Ptr <= Alloc.first + Alloc.second.Length). If we have two allocations back-to-back, the pointer to the beginning of the second allocation would incorrectly be determined as belonging to the first allocation. This caused false-positives errors about out-of-bounds memory operations.

ptrdiff_t is declared in std namespace. Define syclex namespace alias used in the usage examples.

This PR adds an algorithm for doing a GPU wide barrier in CUDA backend. Rough outline of the algorithm: - Every `0th` thread from each workgroup performs `atomic.add(1)` - The same thread checks the atomic result with `ld.acquire` in a loop until it's equal to total amount of workgroups. - All threads call group-wide `barrier.sync` One caveat to this is that there is no initialization of the atomic start value. So if we call this barrier several times in a kernel, on the second iteration, the start value will already contain the result from previous barrier. That's why we actually spin the while loop while `current value % totalWgroups != 0`.

SYCL properties weren't converted when calling creatreURProgram, leading to issue in finalization during KernelFusion for AMD. Fixes intel#14841

EwanC

LGTM as a draft to refine during implementation

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

…ntel#14740) Adjust spec constant pattern match for base alloca + offset case in device sanitizer. Address sanitizer merges static allocas into a large layout base alloca and original alloca is replaced with base + offset.

…4513) UR Part: oneapi-src/unified-runtime#1844

Tests oneapi-src/unified-runtime#1884 Fixes intel#14704 Fixes intel#14721

The test was expecting 'kernel_name' metadata on an edge_create event which should not exist It was still sometimes matching anyway but appears to have been a fluke and not intended behavior. Fixes intel#14744

Initial public working draft for thread block cluster support in SYCL, intended to get feedback. Contains the proposal for - 1. Launching a kernel with cluster group 2. Accessing the various `ids` associated with the cluster_group from the kernel 3. Cluster level barrier 4. Accessing another workgroup's local memory --------- Co-authored-by: Greg Lueck <[email protected]> Co-authored-by: Gordon Brown <[email protected]> Co-authored-by: John Pennycook <[email protected]> Co-authored-by: Ruyman <[email protected]>

Adds the kernel binary update feature to the sycl graph specification. This introduces a new dynamic_command_group class which can be used to update the command-group function of a kernel nodes in graphs.

fabiomestre · 2024-08-05T14:27:36Z

Upstream PR: intel#14896

fabiomestre commented Jul 18, 2024

View reviewed changes

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc Outdated Show resolved Hide resolved

EwanC reviewed Jul 19, 2024

View reviewed changes

EwanC reviewed Jul 23, 2024

View reviewed changes

Bensuo reviewed Jul 23, 2024

View reviewed changes

frasercrmck and others added 5 commits July 29, 2024 19:06

[SYCL] Fix use of removed ArchType enum (intel#14833)

e664798

Two concurrent PRs added a new use of and simultaneously removed this enum. Commit 63c61d8 added a new use, while dc37699 was trying to delete it.

[SYCL] Mark ASAN tests that are failing the nightly as unsupported. (i…

0f7b261

…ntel#14820) These are due to a known regression introduced by the PI removal patch, we have a fix but for now it's more expedient to simply disable the tests and unblock the nightly workflow.

[SYCL][ESIMD] Move spirv global translation out of the function proce…

ff35d2f

…ssing to improve compilation time (intel#14786)

EwanC reviewed Jul 30, 2024

View reviewed changes

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc Outdated Show resolved Hide resolved

bb-sycl and others added 20 commits July 30, 2024 14:59

[GHA] Uplift Linux GPU RT version to 24.26.30049.6 (intel#14838)

1bc6751

Scheduled drivers uplift Co-authored-by: GitHub Actions <[email protected]>

[SYCL][ESIMD] Fix driver check in two tests (intel#14832)

70268e6

When using L0 we always see the x.y.zzzzz style version, even on Windows. These tests were incorrectly running on Windows because of this problem. Signed-off-by: Sarnie, Nick <[email protected]>

[SYCL][E2E] Add -Werror flag to sycl e2e tests (intel#14689)

e0a222f

This patch adds the `-Werror` flag to all SYCL e2e tests to stop the introduction of new warnings. Added `-Wno-error=` to existing tests that have warnings (Or made changes to resolve the warnings).

[SYCL][E2E] Fix deprecated warnings in WeakObject e2e tests (intel#14810

822d63e

)

[SYCL] Fix XPASS of Matrix test on new GPU driver (intel#14849)

f990a8a

It's passing in the new driver uplift: https://github.com/intel/llvm/actions/runs/10164787306/job/28112343861 Signed-off-by: Sarnie, Nick <[email protected]>

[SYCL][Doc] Mark spec for prefetch extension as supported (intel#14735)

c79c3df

According to release notes the extension was implemented by intel@e7139b0, intel@0229456 and intel@b5d69df --------- Co-authored-by: Greg Lueck <[email protected]>

[NFC] Reflow files in buildbot and sycl-fusion dirs (intel#14792)

300b1f8

Run `black` on python files in buildbot and fusion directories. Those files skipped the original formatting effort, so any change to them now would cause a formatting CI job to fail.

[SYCL][E2E] Fix bindless images tests to run on Level Zero devices (i…

1354ff2

…ntel#14779) Signed-off-by: Neil R. Spruit <[email protected]>

[NFC][Bindless] Add a test of DX12 interop without semaphore (intel#1…

20351f5

…4790) This allows testing of DX12 interop in L0 backend that doesn't support semaphore importing yet. Fix getDX12Adapter to increment index when software adapter is seen.

[SYCL][JM tests] Remove xfail from JM prefetch test (intel#14829)

a66958b

[SYCL][COMPAT] Disable memory_async.cpp tests (intel#14855)

7886c87

These are failing intermittently, possibly due to runtime race condition.

[SYCL] Turn off host_task2_multiple_roots.cpp (intel#14853)

d0415e0

[SYCL][E2E] Fix another matrix XPASS (intel#14866)

98beefd

Passing on Arc in postcommit https://github.com/intel/llvm/actions/runs/10180580875/job/28159431987 Signed-off-by: Sarnie, Nick <[email protected]>

[SYCL][ESIMD][E2E] Disable two tests hanging on Windows (intel#14869)

2dfdf1d

intel#14868 Signed-off-by: Sarnie, Nick <[email protected]>

[SYCL][E2E] Fix SG32/joint_matrix_prefetch.cpp (intel#14870)

ff7cf16

Feedback from intel#14866 Signed-off-by: Sarnie, Nick <[email protected]>

[SYCL][Doc] Fix ptrdiff_t type namespace qualifier (intel#14888)

6532637

ptrdiff_t is declared in std namespace. Define syclex namespace alias used in the usage examples.

[SYCL][AMD] Propagate metadata in createURProgram (intel#14831)

41d8977

SYCL properties weren't converted when calling creatreURProgram, leading to issue in finalization during KernelFusion for AMD. Fixes intel#14841

EwanC approved these changes Aug 1, 2024

View reviewed changes

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc Outdated Show resolved Hide resolved

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc Outdated Show resolved Hide resolved

sergey-semenov and others added 7 commits August 1, 2024 10:43

[SYCL] Fix dependency compilation after PI removal (intel#14843)

914561a

[UR][DeviceSantizer] Enable Symoblizer for UR santizer layer (intel#1…

56b1410

…4513) UR Part: oneapi-src/unified-runtime#1844

Bump UR tag to include PrintTrace fix (intel#14728)

8411a7b

Tests oneapi-src/unified-runtime#1884 Fixes intel#14704 Fixes intel#14721

[SYCL] Fix XPTI/basic_event_collection_linux failure (intel#14857)

495645c

The test was expecting 'kernel_name' metadata on an edge_create event which should not exist It was still sometimes matching anyway but appears to have been a fluke and not intended behavior. Fixes intel#14744

[SYCL][Graph] Add specification for kernel binary updates

4637510

Adds the kernel binary update feature to the sycl graph specification. This introduces a new dynamic_command_group class which can be used to update the command-group function of a kernel nodes in graphs.

fabiomestre force-pushed the fabio/kernel_binary_update branch from 48f6566 to 4637510 Compare August 1, 2024 12:34

fabiomestre closed this Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][Graph] Add specification for kernel binary update #378

[SYCL][Graph] Add specification for kernel binary update #378

fabiomestre commented Jul 18, 2024

EwanC left a comment

fabiomestre commented Aug 5, 2024

[SYCL][Graph] Add specification for kernel binary update #378

[SYCL][Graph] Add specification for kernel binary update #378

Conversation

fabiomestre commented Jul 18, 2024

EwanC left a comment

Choose a reason for hiding this comment

fabiomestre commented Aug 5, 2024