-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce CUDA backend specification (revised) #420
base: main
Are you sure you want to change the base?
Introduce CUDA backend specification (revised) #420
Conversation
CUDA backend specification doc pulled in from KhronosGroup#197.
Since #197 I've made some additional changes based on feedback:
|
|==== | ||
| Function | Description | ||
| [code]#bool cuda::has_native_event()# | Returns true if the SYCL [code]#event# has a native [code]#CUevent# associated with it. | ||
|==== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this relate to sycl::error_category_for
(see #164)? If the idea is that each backend interop spec will provide a get_error_code
function like this, then can we remove the ill-defined sycl::error_category_for
from the core spec?
It would be very good to resolve #164 because we deferred any CTS testing of sycl::error_category_for
due to its unclear specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed your comment here. Are you referring to cuda::get_error_code(exception &)
and how that relates to sycl::error_category_for
?
Yeah, I think the consensus was that we should move towards per-backend get_error_code
functions, so then we can remove sycl::error_category_for
. Though I'm not sure if it was decided exactly what we should do with sycl::error_category_for
, whether it's removed immediately from SYCL 2020 or deprecated, I think there is justification for removing immediately it as it's been found to not work as it was intended.
SYCL WG meeting 2023/06/15: No recent progress, need to come back in a couple of weeks |
Discussed in Oct/5/23 - @AerialMantis to say when to discuss again. |
single thread is a SYCL work-item. | ||
|
||
CUDA primarily synchronizes the threads through two functions, | ||
[code]#cudaStreamSynchronize()# and [code]#\__syncthreads()#. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure all of the mentions of cudaStreamSynchronize
in this paragraph should be the more general cudaDeviceSynchronize
/cuCtxSynchronize
| [code]#device# | [code]#CUdevice# | [code]#CUdevice# | A SYCL [code]#device# created from a [code]#CUdevice# will return a copy of an existing device with that native [code]#CUdevice#. The [code]#CUdevice# retireved from a SYCL[code]#device# will be the native [code]#CUdevice# associated with it. | ||
| [code]#context# | [code]#CUcontext# | [code]#std::vector<CUcontext># | A SYCL [code]#context# created from a single [code]#CUcontext# will encapsulate that context. All [code]#CUcontext#s associated with a SYCL [code]#context# are retireved from a SYCL [code]#context#. The native [code]#CUcontext#s may contain onr or more CUDA primary contexts. | ||
| [code]#queue# | [code]#CUstream# | [code]#CUstream# | A SYCL [code]#queue# created from a [code]#CUstream# will encapsulate that stream and will have the [code]#property::queue::in_order# property . A single [code]#CUstream# is retrieved from a SYCL[code]#queue#, and before it is returned all native [code]#CUstream#s associated with the SYCL [code]#queue# synchronize with the calling thread. | ||
| [code]#event# | [code]#CUevent# | [code]#CUevent# | A SYCL [code]#event# created from a [code]#CUstream# will encapsulate that event. A single [code]#CUevent# is potentially retrieved from a SYCL [code]#event# if there is a valid native [code]#CUevent# associated with it, otherwise [code]#nullptr# is returned instead. The CUDA backend-specific free function [code]#cuda::has_native_event# can be used to query whether the SYCL [code]#event# has a valid native [code]#CUevent# associated with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| [code]#event# | [code]#CUevent# | [code]#CUevent# | A SYCL [code]#event# created from a [code]#CUstream# will encapsulate that event. A single [code]#CUevent# is potentially retrieved from a SYCL [code]#event# if there is a valid native [code]#CUevent# associated with it, otherwise [code]#nullptr# is returned instead. The CUDA backend-specific free function [code]#cuda::has_native_event# can be used to query whether the SYCL [code]#event# has a valid native [code]#CUevent# associated with it. | |
| [code]#event# | [code]#CUevent# | [code]#CUevent# | A SYCL [code]#event# created from a [code]#CUevent# will encapsulate that event. A single [code]#CUevent# is potentially retrieved from a SYCL [code]#event# if there is a valid native [code]#CUevent# associated with it, otherwise [code]#nullptr# is returned instead. The CUDA backend-specific free function [code]#cuda::has_native_event# can be used to query whether the SYCL [code]#event# has a valid native [code]#CUevent# associated with it. |
|
||
==== Synchronization | ||
|
||
When retireving a [code]#CUstream# from a SYCL [code]#queue# the SYCL runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When retireving a [code]#CUstream# from a SYCL [code]#queue# the SYCL runtime | |
When retrieving a [code]#CUstream# from a SYCL [code]#queue# the SYCL runtime |
|
||
When retireving a [code]#CUstream# from a SYCL [code]#queue# the SYCL runtime | ||
must synchronize all [code]#CUstream#s associated with the SYCL [code]#queue# | ||
in order to guarnatee consistent ordering of commands previously enqueued to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to guarnatee consistent ordering of commands previously enqueued to | |
in order to guarantee consistent ordering of commands previously enqueued to |
must synchronize all [code]#CUstream#s associated with the SYCL [code]#queue# | ||
in order to guarnatee consistent ordering of commands previously enqueued to | ||
the SYCL [code]#queue# in relation to any commands enqueued using the native | ||
[code]#CUstream#. The native [code]#stream# must be synchornized with and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[code]#CUstream#. The native [code]#stream# must be synchornized with and | |
[code]#CUstream#. The native [code]#stream# must be synchronized with and |
Some things missing here:
|
Why the preference for the same cuda stream? I think for performance reasons it is better to leave this unspecified to allow different streams to be returned. |
If I want I think if a secondary stream is needed for a queue then it can be created within a HT or the user can use in order SYCL queues as CUstreams, which seems to be what a lot of users do anyway. |
There is no need to check if that is not specified. And I think this is the way to go - letting implementations optimize by providing different streams when they can, but not requiring it. I agree returning all underlying streams has little value. |
The primary motivation is that something like AdaptiveCPP's This is a small step towards being able to submit async work in the HT lambda. At the minimum I think we should define this for in order queues, whatever about out of order. |
@t4c1 , @jle-quel and I were talking about this offline. We think it's important to guarantee that for an in order queue, So for out of order queues: q.host_task(...
bool sameStream = ih.get_native_queue() == ih.get_native_queue();
}); Might be false, whereas for in order queues this is always true. And q.host_task(...
stream_acc[0] = ih.get_native_queue();
}).wait();
q.host_task(...
bool sameStream = ih.get_native_queue() == stream_acc[0];
// Using the stream stored in previous HT lambda through accessor
}); Will always be true for in order queues. This may or may not evaluate to true for out of order queues. Maybe also worth mentioning that if dispatching to the same CUstream across different HTs it is important to make sure the correct command dependencies are used, so async work is dispatched to CUstreams in the correct order. |
This pull request is a revised version of the original pull request for introducing the CUDA backend (#197). I created a new PR as the original PR was becoming very difficult to continue rebasing.
Introduces the CUDA backend specification covering a mapping of the platform, execution and memories models and interoperability with the CUDA API for SYCL implementations targeting a CUDA backend.
The backend specification has been added as an appendix to the SYCL 2020 specification.
A pull request for introducing a test plan to the SYCL CTS for the CUDA backend interoperability defined in this backend specification is here - KhronosGroup/SYCL-CTS#207. This will still need to be updated with recent changes.
Contributors: @AerialMantis, @AidanBeltonS, @npmiller, @hdelan, @t4c1, @jchlanda, @JackAKirk, @gmlueck, @illuhad.