forked from intel/llvm
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Add specification for kernel binary update #378
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fabiomestre
commented
Jul 18, 2024
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
EwanC
reviewed
Jul 19, 2024
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
EwanC
reviewed
Jul 23, 2024
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
Bensuo
reviewed
Jul 23, 2024
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
…rties (intel#14441) This PR defines a new user-facing struct `launch_strategy`, and two new `launch` overloads (currently in `syclcompat::experimental`) which accept a `launch_strategy`. ## Extensions & Properties This work builds on top of the [kernel_properties](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_properties.asciidoc) and [enqueue_functions](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc) extensions. The latter defines APIs for passing `launch_properties` as part of a `launch_config` object. These are the `parallel_for` and `nd_launch` overloads used by the new `launch`. See the [note](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc#launch-configuration) in the Launch configuration section which describes how `kernel_properties` must be passed via a `get(properties_tag)` method of a kernel functor. ## Local Memory Note also that in order to properly handle local memory, we **must** construct the `KernelFunctor` object within the `cgh` lambda, passing in a `local_accessor` to the constructor. Then within `KernelFunctor::operator()` (the SYCL 'kernel') we can at last grab the local memory pointer with `local_acc.get_multi_ptr<sycl::access::decorated::no>()`, since CUDA-style device functions expect to receive their dynamic local memory as a `char *`. --------- Signed-off-by: Joe Todd <[email protected]>
…ntel#14820) These are due to a known regression introduced by the PI removal patch, we have a fix but for now it's more expedient to simply disable the tests and unblock the nightly workflow.
intel#14444) Rename related interop structs/funcs with "external" keyword over "interop" to align better with existing structs/funcs and other 3rd party APIs. Remove "handle" keyword from imported external memory/semaphore objects to distinguish between 3rd party external handles and imported external handles. --------- Co-authored-by: Sean Stirling <[email protected]> Co-authored-by: chedy.najjar <[email protected]>
…ssing to improve compilation time (intel#14786)
EwanC
reviewed
Jul 30, 2024
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
Scheduled drivers uplift Co-authored-by: GitHub Actions <[email protected]>
`detail::memcpy`, even though in a different namespace, can cause ambiguity with libc's `memcpy`, due to argument dependent lookup (ADL). For example, the compiler throws a compilation error due to `memcpy` ambiguity in the following code: ``` #include <sycl/vector.hpp> template <typename T> void foo(T *dst, T *src, size_t count) { memcpy(dst, src, count * sizeof(T)); } using T = sycl::vec<int, 1>; SYCL_EXTERNAL void bar(T *dst, T *src, size_t count) { foo(dst, src, count * sizeof(T)); } ``` Compilation error: ``` memcpy_test.cpp:5:4: error: call to 'memcpy' is ambiguous 5 | memcpy(dst, src, count * sizeof(T)); | ^~~~~~ memcpy_test.cpp:11:4: note: in instantiation of function template specialization 'foo<sycl::vec<int, 1>>' requested here 11 | foo(dst, src, count * sizeof(T)); | ^ /usr/include/string.h:43:14: note: candidate function 43 | extern void *memcpy (void *__restrict __dest, const void *__restrict __src, | ^ llvm/build/bin/../include/sycl/detail/memcpy.hpp:16:13: note: candidate function 16 | inline void memcpy(void *Dst, const void *Src, size_t Size) { | ^ 1 error generated. ``` To fix this error, this PR renames `detail::memcpy` to `detail::memcpy_no_adl`
When using L0 we always see the x.y.zzzzz style version, even on Windows. These tests were incorrectly running on Windows because of this problem. Signed-off-by: Sarnie, Nick <[email protected]>
This patch adds the `-Werror` flag to all SYCL e2e tests to stop the introduction of new warnings. Added `-Wno-error=` to existing tests that have warnings (Or made changes to resolve the warnings).
It's passing in the new driver uplift: https://github.com/intel/llvm/actions/runs/10164787306/job/28112343861 Signed-off-by: Sarnie, Nick <[email protected]>
According to release notes the extension was implemented by intel@e7139b0, intel@0229456 and intel@b5d69df --------- Co-authored-by: Greg Lueck <[email protected]>
Run `black` on python files in buildbot and fusion directories. Those files skipped the original formatting effort, so any change to them now would cause a formatting CI job to fail.
…ntel#14779) Signed-off-by: Neil R. Spruit <[email protected]>
…4790) This allows testing of DX12 interop in L0 backend that doesn't support semaphore importing yet. Fix getDX12Adapter to increment index when software adapter is seen.
…ry (intel#14818) By default, address sanitizer will inline call for setting private shadow memory with small size. However, if work group size is too large, the private shadow memory may allocate failed. We need to check if shadow base is null before trying to poison it. --------- Co-authored-by: Yang Zhao <[email protected]>
These are failing intermittently, possibly due to runtime race condition.
Passing on Arc in postcommit https://github.com/intel/llvm/actions/runs/10180580875/job/28159431987 Signed-off-by: Sarnie, Nick <[email protected]>
intel#14868 Signed-off-by: Sarnie, Nick <[email protected]>
Feedback from intel#14866 Signed-off-by: Sarnie, Nick <[email protected]>
When looking for the correct allocation, the upper bound check was inclusive (Ptr <= Alloc.first + Alloc.second.Length). If we have two allocations back-to-back, the pointer to the beginning of the second allocation would incorrectly be determined as belonging to the first allocation. This caused false-positives errors about out-of-bounds memory operations.
ptrdiff_t is declared in std namespace. Define syclex namespace alias used in the usage examples.
This PR adds an algorithm for doing a GPU wide barrier in CUDA backend. Rough outline of the algorithm: - Every `0th` thread from each workgroup performs `atomic.add(1)` - The same thread checks the atomic result with `ld.acquire` in a loop until it's equal to total amount of workgroups. - All threads call group-wide `barrier.sync` One caveat to this is that there is no initialization of the atomic start value. So if we call this barrier several times in a kernel, on the second iteration, the start value will already contain the result from previous barrier. That's why we actually spin the while loop while `current value % totalWgroups != 0`.
SYCL properties weren't converted when calling creatreURProgram, leading to issue in finalization during KernelFusion for AMD. Fixes intel#14841
EwanC
approved these changes
Aug 1, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as a draft to refine during implementation
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
…ntel#14740) Adjust spec constant pattern match for base alloca + offset case in device sanitizer. Address sanitizer merges static allocas into a large layout base alloca and original alloca is replaced with base + offset.
The test was expecting 'kernel_name' metadata on an edge_create event which should not exist It was still sometimes matching anyway but appears to have been a fluke and not intended behavior. Fixes intel#14744
Initial public working draft for thread block cluster support in SYCL, intended to get feedback. Contains the proposal for - 1. Launching a kernel with cluster group 2. Accessing the various `ids` associated with the cluster_group from the kernel 3. Cluster level barrier 4. Accessing another workgroup's local memory --------- Co-authored-by: Greg Lueck <[email protected]> Co-authored-by: Gordon Brown <[email protected]> Co-authored-by: John Pennycook <[email protected]> Co-authored-by: Ruyman <[email protected]>
Adds the kernel binary update feature to the sycl graph specification. This introduces a new dynamic_command_group class which can be used to update the command-group function of a kernel nodes in graphs.
fabiomestre
force-pushed
the
fabio/kernel_binary_update
branch
from
August 1, 2024 12:34
48f6566
to
4637510
Compare
Upstream PR: intel#14896 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.