From f2b2887594112fe225b1e347c443fa93123bec7f Mon Sep 17 00:00:00 2001 From: Pablo Reble Date: Wed, 24 Jan 2024 16:31:14 -0600 Subject: [PATCH 1/3] [SYCL][Graph] Add initial draft of malloc/free nodes --- .../sycl_ext_oneapi_graph.asciidoc | 108 ++++++++++++------ 1 file changed, 71 insertions(+), 37 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index 4ed2abdf0e88..6d791adac50b 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -1691,45 +1691,79 @@ Exceptions: |=== -=== Features Still in Development - ==== Memory Allocation Nodes -There is no provided interface for users to define a USM allocation/free -operation belonging to the scope of the graph. It would be error prone and -non-performant to allocate or free memory as a node executed during graph -submission. Instead, such a memory allocation API needs to provide a way to -return a pointer which won't be valid until the allocation is made on graph -finalization, as allocating at finalization is the only way to benefit from -the known graph scope for optimal memory allocation, and even optimize to -eliminate some allocations entirely. - -Such a deferred allocation strategy presents challenges however, and as a result -we recommend instead that prior to graph construction users perform core SYCL -USM allocations to be used in the graph submission. Before to coming to this -recommendation we considered the following explicit graph building interfaces -for adding a memory allocation owned by the graph: - -1. Allocation function returning a reference to the raw pointer, i.e. `void*&`, - which will be instantiated on graph finalization with the location of the - allocated USM memory. - -2. Allocation function returning a handle to the allocation. Applications use - the handle in node command-group functions to access memory when allocated. - -3. Allocation function returning a pointer to a virtual allocation, only backed - with an actual allocation when graph is finalized or submitted. - -Design 1) has the drawback of forcing users to keep the user pointer variable -alive so that the reference is valid, which is unintuitive and is likely to -result in bugs. - -Design 2) introduces a handle object which has the advantages of being a less -error prone way to provide the pointer to the deferred allocation. However, it -requires kernel changes and introduces an overhead above the raw pointers that -are the advantage of USM. - -Design 3) needs specific backend support for deferred allocation. +Support depends on the availablity of backend support for deferred allocation: +link:../experimental/sycl_ext_oneapi_virtual_mem.asciidoc[sycl_ext_oneapi_virtual_mem] + +The following interfaces enables users to define a memory allocation/free operation +belonging to the scope of the graph. It would be error prone and non-performant +to allocate or free memory as a node executed during graph submission. Instead, +such a memory allocation API needs to provide a way to return a pointer which +won't be valid until the allocation is made on graph finalization, as allocating +at finalization is the only way to benefit from the known graph scope for optimal +memory allocation, and even optimize to eliminate some allocations entirely. + +Table {counter: tableNumber}. Member functions of the `command_graph` class (memory allocation). +[cols="2a,a"] +|=== +|Member function|Description + +| +[source, c++] +---- +std::pair +add_malloc_device(size_t num_bytes, const property_list& propList = {}); +---- + + +| +Returns a pair of a pointer to memory and a node. The pointer is allocated on the `device` +that is associated with current graph by first execution of the `command_graph`. +All nodes that depend on this node and are thereby executed after have access to the allocated memory. +The allocation size is specified in bytes. + +Preconditions: + +* This member function is only available when the `command_graph` state is + `graph_state::modifiable`. + +Parameters: + +* `num_bytes` - allocation size in bytes. + +* `propList` - Zero or more properties can be provided to the constructed node + via an instance of `property_list`. The `property::node::depends_on` property + can be passed here with a list of nodes to create dependency edges on. + +Exceptions: + +* Throws synchronously with error code `feature_not_supported` if any devices in `context` +does not have `aspect::usm_device_allocations`. +| + +[source, c++] +---- +node +add_free(void* ptr, const property_list& propList = {}); +---- + + +| +Returns a free node that has been added to the graph. Accesses of nodes that depend of this node +(predecessors) to the allocated memory are undefined behavior. + +Parameters: + +* `ptr` - memory pointed to by. Must be allocated by `add_malloc_device`. + +* `propList` - Zero or more properties can be provided to the constructed node + via an instance of `property_list`. The `property::node::depends_on` property + can be passed here with a list of nodes to create dependency edges on. + +|=== + +=== Features Still in Development ==== Device Specific Graph From 842cf2c7545ab9efacd789afff7f5d94f57b349f Mon Sep 17 00:00:00 2001 From: Pablo Reble Date: Thu, 25 Jan 2024 16:05:33 -0600 Subject: [PATCH 2/3] Apply suggestions from code review --- .../extensions/experimental/sycl_ext_oneapi_graph.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index 6d791adac50b..b35ecf1027b7 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -1738,8 +1738,8 @@ Parameters: Exceptions: -* Throws synchronously with error code `feature_not_supported` if any devices in `context` -does not have `aspect::usm_device_allocations`. +* Throws synchronously with error code `feature_not_supported` if any device associated +with the command graph does not have `aspect::usm_device_allocations`. | [source, c++] From 46ff96fd96ee81f0e2f629406d22215fa0480c15 Mon Sep 17 00:00:00 2001 From: Pablo Reble Date: Mon, 29 Jan 2024 14:24:03 -0600 Subject: [PATCH 3/3] remove issue about memory nodes --- .../experimental/sycl_ext_oneapi_graph.asciidoc | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index b35ecf1027b7..94db5e4510ad 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -1794,16 +1794,6 @@ Allow an executable graph to contain nodes targeting different devices. introducing into the extension in later revisions. It has been planned for to the extent that the definition of a graph node is device specific. -=== Memory Allocation API - -We would like to provide an API that allows graph scope memory to be -allocated and used in nodes, such that optimizations can be done on -the allocation. No mechanism is currently provided, but see the -section on <> for -some designs being considered. - -**UNRESOLVED:** Trending "yes". Design is under consideration. - === Device Agnostic Graph Explicit API could support device-agnostic graphs that can be submitted