-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Add initial draft of malloc/free nodes ... #352
base: sycl
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1710,45 +1710,79 @@ Exceptions: | |
|
||
|=== | ||
|
||
=== Features Still in Development | ||
|
||
==== Memory Allocation Nodes | ||
|
||
There is no provided interface for users to define a USM allocation/free | ||
operation belonging to the scope of the graph. It would be error prone and | ||
non-performant to allocate or free memory as a node executed during graph | ||
submission. Instead, such a memory allocation API needs to provide a way to | ||
return a pointer which won't be valid until the allocation is made on graph | ||
finalization, as allocating at finalization is the only way to benefit from | ||
the known graph scope for optimal memory allocation, and even optimize to | ||
eliminate some allocations entirely. | ||
Support depends on the availablity of backend support for deferred allocation: | ||
link:../experimental/sycl_ext_oneapi_virtual_mem.asciidoc[sycl_ext_oneapi_virtual_mem] | ||
|
||
Such a deferred allocation strategy presents challenges however, and as a result | ||
we recommend instead that prior to graph construction users perform core SYCL | ||
USM allocations to be used in the graph submission. Before to coming to this | ||
recommendation we considered the following explicit graph building interfaces | ||
for adding a memory allocation owned by the graph: | ||
The following interfaces enables users to define a memory allocation/free operation | ||
belonging to the scope of the graph. It would be error prone and non-performant | ||
to allocate or free memory as a node executed during graph submission. Instead, | ||
such a memory allocation API needs to provide a way to return a pointer which | ||
won't be valid until the allocation is made on graph finalization, as allocating | ||
at finalization is the only way to benefit from the known graph scope for optimal | ||
memory allocation, and even optimize to eliminate some allocations entirely. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think people will have the question "if I have two executable graphs created from the same modifiable graph, will that node be different memory allocations?", so we should state something about that. |
||
1. Allocation function returning a reference to the raw pointer, i.e. `void*&`, | ||
which will be instantiated on graph finalization with the location of the | ||
allocated USM memory. | ||
Table {counter: tableNumber}. Member functions of the `command_graph` class (memory allocation). | ||
[cols="2a,a"] | ||
|=== | ||
|Member function|Description | ||
|
||
2. Allocation function returning a handle to the allocation. Applications use | ||
the handle in node command-group functions to access memory when allocated. | ||
| | ||
[source, c++] | ||
---- | ||
std::pair<void*,node> | ||
EwanC marked this conversation as resolved.
Show resolved
Hide resolved
|
||
add_malloc_device(size_t num_bytes, const property_list& propList = {}); | ||
Comment on lines
+1734
to
+1735
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we have this templated so that users can call this similarly to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, will add. My only reason to start with a single version of this API was not having to update too many functions, in case we change the basic design of this interface. |
||
---- | ||
|
||
3. Allocation function returning a pointer to a virtual allocation, only backed | ||
with an actual allocation when graph is finalized or submitted. | ||
|
||
Design 1) has the drawback of forcing users to keep the user pointer variable | ||
alive so that the reference is valid, which is unintuitive and is likely to | ||
result in bugs. | ||
| | ||
Returns a pair of a pointer to memory and a node. The pointer is allocated on the `device` | ||
that is associated with current graph by first execution of the `command_graph`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Question about whether we want to tighten this more to say something like "allocated during the finalization of command-graph to the returned executable graph". The current wording leaves open the suggestion that if you never execute the graph nothing will be allocated, but we probably will do the allocation on finalization - which means less latency when submitting the graph. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how about "may be allocated during the finalization ... "? That way we can keep our options, depending on the backend. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shall we align with line 1723 here? in line 1723, it says |
||
All nodes that depend on this node and are thereby executed after have access to the allocated memory. | ||
The allocation size is specified in bytes. | ||
|
||
Design 2) introduces a handle object which has the advantages of being a less | ||
error prone way to provide the pointer to the deferred allocation. However, it | ||
requires kernel changes and introduces an overhead above the raw pointers that | ||
are the advantage of USM. | ||
Preconditions: | ||
|
||
Design 3) needs specific backend support for deferred allocation. | ||
* This member function is only available when the `command_graph` state is | ||
`graph_state::modifiable`. | ||
|
||
Parameters: | ||
|
||
* `num_bytes` - allocation size in bytes. | ||
|
||
* `propList` - Zero or more properties can be provided to the constructed node | ||
via an instance of `property_list`. The `property::node::depends_on` property | ||
can be passed here with a list of nodes to create dependency edges on. | ||
|
||
Exceptions: | ||
|
||
* Throws synchronously with error code `feature_not_supported` if any device associated | ||
with the command graph does not have `aspect::usm_device_allocations`. | ||
| | ||
|
||
[source, c++] | ||
---- | ||
node | ||
add_free(void* ptr, const property_list& propList = {}); | ||
---- | ||
|
||
|
||
| | ||
Returns a free node that has been added to the graph. Accesses of nodes that depend of this node | ||
(predecessors) to the allocated memory are undefined behavior. | ||
|
||
Parameters: | ||
|
||
* `ptr` - memory pointed to by. Must be allocated by `add_malloc_device`. | ||
|
||
* `propList` - Zero or more properties can be provided to the constructed node | ||
via an instance of `property_list`. The `property::node::depends_on` property | ||
can be passed here with a list of nodes to create dependency edges on. | ||
|
||
|=== | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Heads up that once intel#12366 merges i think we'll need to extend the |
||
=== Features Still in Development | ||
|
||
reble marked this conversation as resolved.
Show resolved
Hide resolved
|
||
==== Device Specific Graph | ||
|
||
|
@@ -1779,16 +1813,6 @@ Allow an executable graph to contain nodes targeting different devices. | |
introducing into the extension in later revisions. It has been planned for to the | ||
extent that the definition of a graph node is device specific. | ||
|
||
=== Memory Allocation API | ||
|
||
We would like to provide an API that allows graph scope memory to be | ||
allocated and used in nodes, such that optimizations can be done on | ||
the allocation. No mechanism is currently provided, but see the | ||
section on <<memory-allocation-nodes, Memory Allocation Nodes>> for | ||
some designs being considered. | ||
|
||
**UNRESOLVED:** Trending "yes". Design is under consideration. | ||
|
||
=== Device Agnostic Graph | ||
|
||
Explicit API could support device-agnostic graphs that can be submitted | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there another plan/draft to record the memory allocation/free with record&replay APIs?
The current functions such as sycl::malloc_device are not recorded into a graph with record&replay APIs, we may need new async functions such as sycl::malloc_device_async