[SYCL][Graph] Add initial draft of malloc/free nodes ... #352

reble · 2024-01-24T22:33:42Z

... with backend support for deferred memory allocation.

As proposed in intel#8954

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

EwanC

Tried not to nitpick this too much until we agree on the high level design, but think this looks good. API is simple, clear mapping from CUDA-Graph entry-points for porting code, and the dependency on sycl_ext_oneapi_virtual_mem hopefully means we can hand off having to go into any low level details.

EwanC · 2024-01-25T17:54:31Z

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

+won't be valid until the allocation is made on graph finalization, as allocating
+at finalization is the only way to benefit from the known graph scope for optimal
+memory allocation, and even optimize to eliminate some allocations entirely.
+


I think people will have the question "if I have two executable graphs created from the same modifiable graph, will that node be different memory allocations?", so we should state something about that.

EwanC · 2024-01-25T18:01:16Z

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

+
+|
+Returns a pair of a pointer to memory and a node. The pointer is allocated on the `device` 
+that is associated with current graph by first execution of the `command_graph`.


first execution of the command_graph

Question about whether we want to tighten this more to say something like "allocated during the finalization of command-graph to the returned executable graph".

The current wording leaves open the suggestion that if you never execute the graph nothing will be allocated, but we probably will do the allocation on finalization - which means less latency when submitting the graph.

how about "may be allocated during the finalization ... "? That way we can keep our options, depending on the backend.

shall we align with line 1723 here?

in line 1723, it says allocating at finalization is the only way

EwanC · 2024-01-25T18:02:14Z

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

+
+Exceptions:
+
+* Throws synchronously with error code `feature_not_supported` if any devices in `context` 


we don't have a context parameter, "if device associated with the command graph does not not have ..."

Thanks, I meant: "any device in the context associated with the graph ..." but let's be more accurate!

EwanC · 2024-01-25T18:03:57Z

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

+  can be passed here with a list of nodes to create dependency edges on.
+
+|===
+


Heads up that once intel#12366 merges i think we'll need to extend the node_type enum to include malloc_device and free

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

reble · 2024-01-26T16:23:11Z

An example with structured binding could look like:

...
sycl_ext::command_graph my_graph(my_queue);

float scalar = 2.0f;
size_t N = 1024;
float *ptrX = malloc_device<float>(N, my_queue); // allocate device memory

auto [ptrY, node_a] = my_graph.add_malloc_device<float>(N); // only reserve virtual memory

auto node_b = my_graph.add([=](handler& cgh){cgh.parallel_for(N, [=](id<1> it){ptrX[it] += scalar * ptrY[it];});}, {sycl_ext::property::node::depends_on(node_a)});

auto node_c = my_graph.add_free(ptrY);
auto my_exec = my_graph.finalize(); // may allocate physical memory and map
my_queue.ext_oneapi_graph(my_exec).wait();

Bensuo · 2024-01-29T13:23:27Z

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

+std::pair<void*,node>
+add_malloc_device(size_t num_bytes, const property_list& propList = {});


Could we have this templated so that users can call this similarly to sycl::malloc_device<T>? Just saves the users having to do a cast after the call.

Yes, will add. My only reason to start with a single version of this API was not having to update too many functions, in case we change the basic design of this interface.

sycl/test-e2e/ESIMD/matrix_transpose_glb.cpp

guoyejun · 2024-04-16T06:25:03Z

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

-USM allocations to be used in the graph submission. Before to coming to this
-recommendation we considered the following explicit graph building interfaces
-for adding a memory allocation owned by the graph:
+The following interfaces enables users to define a memory allocation/free operation 


is there another plan/draft to record the memory allocation/free with record&replay APIs?

The current functions such as sycl::malloc_device are not recorded into a graph with record&replay APIs, we may need new async functions such as sycl::malloc_device_async

[SYCL][Graph] Add initial draft of malloc/free nodes

f2b2887

EwanC reviewed Jan 25, 2024

View reviewed changes

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc Show resolved Hide resolved

reble added the Graph Specification Extension Specification related label Jan 25, 2024

EwanC reviewed Jan 25, 2024

View reviewed changes

reble commented Jan 25, 2024

View reviewed changes

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc Outdated Show resolved Hide resolved

Apply suggestions from code review

842cf2c

Bensuo reviewed Jan 29, 2024

View reviewed changes

reble added 2 commits January 29, 2024 14:24

remove issue about memory nodes

46ff96f

Merge branch 'sycl' into pablo/lazy-malloc

17402bf

guoyejun reviewed Apr 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][Graph] Add initial draft of malloc/free nodes ... #352

[SYCL][Graph] Add initial draft of malloc/free nodes ... #352

reble commented Jan 24, 2024

EwanC left a comment

EwanC Jan 25, 2024

EwanC Jan 25, 2024

reble Jan 25, 2024

guoyejun Apr 16, 2024

EwanC Jan 25, 2024

reble Jan 25, 2024

EwanC Jan 25, 2024

reble commented Jan 26, 2024

Bensuo Jan 29, 2024

reble Jan 30, 2024

guoyejun Apr 16, 2024


		Exceptions:

		* Throws synchronously with error code `feature_not_supported` if any devices in `context`

		can be passed here with a list of nodes to create dependency edges on.

		\|===

		std::pair<void*,node>
		add_malloc_device(size_t num_bytes, const property_list& propList = {});

[SYCL][Graph] Add initial draft of malloc/free nodes ... #352

Are you sure you want to change the base?

[SYCL][Graph] Add initial draft of malloc/free nodes ... #352

Conversation

reble commented Jan 24, 2024

EwanC left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reble commented Jan 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment