You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What would you like to be added:
LeaderWorkerSet should support heterogenous resource requirements across Workers.
Why is this needed:
In the use case of disaggregated serving there may be a single top level inference server controller that is orchestrating work across the various WorkerSets. It is likely that the workers doing one type of work be less resource intensive than the workers doing another type of work.
For example, in the case of LLM workloads on GPUs it is likely that the prefill/context workers will require less overall GPU memory and possibly a lower count of GPUs or smaller type of GPUs than the workers doing generation.
In this use case I could think of at least 1 inference server that is designed to properly orchestrate this work and would not fit well into the paradigm of 1 LWS per resource spec (which was discussed as the current design pattern during the LWS KubeCon talk this past week).
Completion requirements:
This enhancement requires the following artifacts:
Design doc
API change
Docs update
The text was updated successfully, but these errors were encountered:
What would you like to be added:
LeaderWorkerSet should support heterogenous resource requirements across Workers.
Why is this needed:
In the use case of disaggregated serving there may be a single top level inference server controller that is orchestrating work across the various WorkerSets. It is likely that the workers doing one type of work be less resource intensive than the workers doing another type of work.
For example, in the case of LLM workloads on GPUs it is likely that the prefill/context workers will require less overall GPU memory and possibly a lower count of GPUs or smaller type of GPUs than the workers doing generation.
In this use case I could think of at least 1 inference server that is designed to properly orchestrate this work and would not fit well into the paradigm of 1 LWS per resource spec (which was discussed as the current design pattern during the LWS KubeCon talk this past week).
Completion requirements:
This enhancement requires the following artifacts:
The text was updated successfully, but these errors were encountered: