-
Notifications
You must be signed in to change notification settings - Fork 972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal of dynamic GPU slice plugin #3820
base: master
Are you sure you want to change the base?
Conversation
Welcome @sailorvii! |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi, please squash to one commit and sign off. |
e3ffd7e
to
0000b26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewd it, please take a look~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the logic of this AddPod
, in the mig-agent of nos? I'm wondering whether our dynamic GPU slice plugin is strongly dependent on the nos project. You can see that the annotation has the watermark of nos, and nos project is not updated frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- AddPod is in volcano/pkg/scheduler/api/node_info.go addResource.
- 3 functions can be reused from nos project: mig agent, mps agent and mps device plugin. They are not the most important part. If needed, we could rewrite them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @Monokaix , I think we'd better rewrite them as part of volcano and evolve with us.
9c752ad
to
7e85873
Compare
A nice feature, but i have a few recommends:
|
Refine as JesseStutler's comments Address the comments by archlitchi. Signed-off-by: sailorvii <[email protected]> Signed-off-by: chenw66 <[email protected]>
Thanks for your time and review.
|
actions: "enqueue, allocate, backfill" | ||
tiers: | ||
- plugins: | ||
- name: dynamicgpuslice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about use the deviceshare plugin?
We should clarify which dp the user should deploy and the relationship between dynamic mig slice and vgpu. The semantics of vgpu and dynamic mig slice are not completely consistent. Whether to use nvidia dp or hami needs to be discussed again. |
Let’s discuss it again how to evolve this feature at the weekly meeting? Currently, it seems that there are three repos: volcano does the scheduling, hami does the dp, and nos does the mig/mps agent. It is too fragmented. @sailorvii @archlitchi @Monokaix |
Thank you all for your time. It's good to discuss the details in the meeting. |
NVIDIA official GPU sharing includes time-slice, MPS and MIG. Currently the MPS and MIG dynamic is not supported, we want to add this into volcano scheduler plugin