Update Device Split Semantics #104

eleon · 2024-04-09T17:29:32Z

Assume the following scenario:

A compute node with a NUMA domain
The NUMA domain has n cores and two devices (GPUs)
There are two tasks that want to split the hardware (user scope) on a GPU basis:

qv_scope_split_at(ctx, user_scope, QV_HW_OBJ_GPU, rank%ngpus, &gpu_scope);

Currently, the split operation results in two sub-scopes:

GPU 0 and all n cores
GPU 1 and all n cores

Note that the n cores are shared in both scopes.
Trying to split the sub-scopes over the tasks to get exclusive cores is not possible because we cannot apply a collective split operation over different sub-scopes.

This issue can be addressed by maintaining a list of exclusive resources associated with each device (e.g., cpuset). In this case, GPU 0 would have half of the cores in its resource list and GPU 1 would have the other half of the cores in its list. With such internal distribution of resources, the call to qv_scope_split_at above would result in the following two sub-scopes:

GPU 0 and n/2 cores
GPU 1 and the other n/2 cores

At this point there is no need for an additional split operation to get exclusive cores associated with a GPU scope.

The text was updated successfully, but these errors were encountered:

eleon mentioned this issue Apr 9, 2024

Test double split operations #105

Open

samuelkgutierrez changed the title ~~Device split semantics~~ Update Device Split Semantics Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Device Split Semantics #104

Update Device Split Semantics #104

eleon commented Apr 9, 2024

Update Device Split Semantics #104

Update Device Split Semantics #104

Comments

eleon commented Apr 9, 2024