Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Add specification for kernel binary update #378

Closed
wants to merge 33 commits into from

Conversation

fabiomestre
Copy link
Collaborator

No description provided.

frasercrmck and others added 5 commits July 29, 2024 19:06
Two concurrent PRs added a new use of and simultaneously removed this
enum. Commit 63c61d8 added a new use, while dc37699 was trying to
delete it.
…rties (intel#14441)

This PR defines a new user-facing struct `launch_strategy`, and two new
`launch` overloads (currently in `syclcompat::experimental`) which
accept a `launch_strategy`.

## Extensions & Properties

This work builds on top of the
[kernel_properties](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_properties.asciidoc)
and
[enqueue_functions](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc)
extensions. The latter defines APIs for passing `launch_properties` as
part of a `launch_config` object. These are the `parallel_for` and
`nd_launch` overloads used by the new `launch`.

See the
[note](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc#launch-configuration)
in the Launch configuration section which describes how
`kernel_properties` must be passed via a `get(properties_tag)` method of
a kernel functor.

## Local Memory

Note also that in order to properly handle local memory, we **must**
construct the `KernelFunctor` object within the `cgh` lambda, passing in
a `local_accessor` to the constructor. Then within
`KernelFunctor::operator()` (the SYCL 'kernel') we can at last grab the
local memory pointer with
`local_acc.get_multi_ptr<sycl::access::decorated::no>()`, since
CUDA-style device functions expect to receive their dynamic local memory
as a `char *`.

---------

Signed-off-by: Joe Todd <[email protected]>
…ntel#14820)

These are due to a known regression introduced by the PI removal patch,
we have a fix but for now it's more expedient to simply disable the
tests and unblock the nightly workflow.
intel#14444)

Rename related interop structs/funcs with "external" keyword over
"interop" to align better with existing structs/funcs and other 3rd
party APIs.

Remove "handle" keyword from imported external memory/semaphore objects
to distinguish between 3rd party external handles and imported external
handles.

---------

Co-authored-by: Sean Stirling <[email protected]>
Co-authored-by: chedy.najjar <[email protected]>
bb-sycl and others added 20 commits July 30, 2024 14:59
Scheduled drivers uplift

Co-authored-by: GitHub Actions <[email protected]>
`detail::memcpy`, even though in a different namespace, can cause
ambiguity with libc's `memcpy`, due to argument dependent lookup (ADL).
For example, the compiler throws a compilation error due to `memcpy`
ambiguity in the following code:
```
#include <sycl/vector.hpp>

template <typename T>
void foo(T *dst, T *src, size_t count) {
          memcpy(dst, src, count * sizeof(T));
}

using T = sycl::vec<int, 1>;

SYCL_EXTERNAL void bar(T *dst, T *src, size_t count) {
          foo(dst, src, count * sizeof(T));
}
```

Compilation error:
```
memcpy_test.cpp:5:4: error: call to 'memcpy' is ambiguous
    5 |           memcpy(dst, src, count * sizeof(T));
      |           ^~~~~~
memcpy_test.cpp:11:4: note: in instantiation of function template specialization 'foo<sycl::vec<int, 1>>' requested here
   11 |           foo(dst, src, count * sizeof(T));
      |           ^
/usr/include/string.h:43:14: note: candidate function
   43 | extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
      |              ^
llvm/build/bin/../include/sycl/detail/memcpy.hpp:16:13: note: candidate function
   16 | inline void memcpy(void *Dst, const void *Src, size_t Size) {
      |             ^
1 error generated.
```

To fix this error, this PR renames `detail::memcpy` to
`detail::memcpy_no_adl`
When using L0 we always see the x.y.zzzzz style version, even on
Windows. These tests were incorrectly running on Windows because of this
problem.

Signed-off-by: Sarnie, Nick <[email protected]>
This patch adds the `-Werror` flag to all SYCL e2e tests to stop the
introduction of new warnings.

Added `-Wno-error=` to existing tests that have warnings (Or made
changes to resolve the warnings).
According to release notes the extension was implemented by

intel@e7139b0,

intel@0229456
and

intel@b5d69df

---------

Co-authored-by: Greg Lueck <[email protected]>
Run `black` on python files in buildbot and fusion directories. Those
files skipped the original formatting effort, so any change to them now
would cause a formatting CI job to fail.
…4790)

This allows testing of DX12 interop in L0 backend that doesn't support
semaphore importing yet.
Fix getDX12Adapter to increment index when software adapter is seen.
…ry (intel#14818)

By default, address sanitizer will inline call for setting private
shadow memory with small size. However, if work group size is too large,
the private shadow memory may allocate failed. We need to check if
shadow base is null before trying to poison it.

---------

Co-authored-by: Yang Zhao <[email protected]>
These are failing intermittently, possibly due to runtime race
condition.
When looking for the correct allocation, the upper bound check was
inclusive (Ptr <= Alloc.first + Alloc.second.Length). If we have two
allocations back-to-back, the pointer to the beginning of the second
allocation would incorrectly be determined as belonging to the first
allocation. This caused false-positives errors about out-of-bounds
memory operations.
ptrdiff_t is declared in std namespace.

Define syclex namespace alias used in the usage examples.
This PR adds an algorithm for doing a GPU wide barrier in CUDA backend. 

Rough outline of the algorithm:
- Every `0th` thread from each workgroup performs `atomic.add(1)`
- The same thread checks the atomic result with `ld.acquire` in a loop
until it's equal to total amount of workgroups.
- All threads call group-wide `barrier.sync`

One caveat to this is that there is no initialization of the atomic
start value. So if we call this barrier several times in a kernel, on
the second iteration, the start value will already contain the result
from previous barrier. That's why we actually spin the while loop while
`current value % totalWgroups != 0`.
SYCL properties weren't converted when calling creatreURProgram, leading
to issue in finalization during KernelFusion for AMD.

Fixes intel#14841
Copy link
Collaborator

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a draft to refine during implementation

sergey-semenov and others added 7 commits August 1, 2024 10:43
…ntel#14740)

Adjust spec constant pattern match for base alloca + offset case in
device sanitizer.
Address sanitizer merges static allocas into a large layout base alloca
and original alloca is replaced with base + offset.
The test was expecting 'kernel_name' metadata on an edge_create event
which should not exist

It was still sometimes matching anyway but appears to have been a fluke
and not intended behavior.

Fixes intel#14744
Initial public working draft for thread block cluster support in SYCL,
intended to get feedback.

Contains the proposal for - 
1. Launching a kernel with cluster group
2. Accessing the various `ids` associated with the cluster_group from
the kernel
3. Cluster level barrier
4. Accessing another workgroup's local memory

---------

Co-authored-by: Greg Lueck <[email protected]>
Co-authored-by: Gordon Brown <[email protected]>
Co-authored-by: John Pennycook <[email protected]>
Co-authored-by: Ruyman <[email protected]>
Adds the kernel binary update feature to the sycl graph specification.
This introduces a new dynamic_command_group class which can be used
to update the command-group function of a kernel nodes in graphs.
@fabiomestre
Copy link
Collaborator Author

Upstream PR: intel#14896

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.