Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Device Split Semantics #104

Open
eleon opened this issue Apr 9, 2024 · 0 comments
Open

Update Device Split Semantics #104

eleon opened this issue Apr 9, 2024 · 0 comments

Comments

@eleon
Copy link
Member

eleon commented Apr 9, 2024

Assume the following scenario:

  • A compute node with a NUMA domain
  • The NUMA domain has n cores and two devices (GPUs)
  • There are two tasks that want to split the hardware (user scope) on a GPU basis:
qv_scope_split_at(ctx, user_scope, QV_HW_OBJ_GPU, rank%ngpus, &gpu_scope);

Currently, the split operation results in two sub-scopes:

  1. GPU 0 and all n cores
  2. GPU 1 and all n cores

Note that the n cores are shared in both scopes.
Trying to split the sub-scopes over the tasks to get exclusive cores is not possible because we cannot apply a collective split operation over different sub-scopes.

This issue can be addressed by maintaining a list of exclusive resources associated with each device (e.g., cpuset). In this case, GPU 0 would have half of the cores in its resource list and GPU 1 would have the other half of the cores in its list. With such internal distribution of resources, the call to qv_scope_split_at above would result in the following two sub-scopes:

  1. GPU 0 and n/2 cores
  2. GPU 1 and the other n/2 cores

At this point there is no need for an additional split operation to get exclusive cores associated with a GPU scope.

@samuelkgutierrez samuelkgutierrez changed the title Device split semantics Update Device Split Semantics Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant