LeaderWorkerSet should support heterogenous resource requirements across Workers #259

supertetelman · 2024-11-19T00:03:30Z

What would you like to be added:
LeaderWorkerSet should support heterogenous resource requirements across Workers.

Why is this needed:
In the use case of disaggregated serving there may be a single top level inference server controller that is orchestrating work across the various WorkerSets. It is likely that the workers doing one type of work be less resource intensive than the workers doing another type of work.

For example, in the case of LLM workloads on GPUs it is likely that the prefill/context workers will require less overall GPU memory and possibly a lower count of GPUs or smaller type of GPUs than the workers doing generation.

In this use case I could think of at least 1 inference server that is designed to properly orchestrate this work and would not fit well into the paradigm of 1 LWS per resource spec (which was discussed as the current design pattern during the LWS KubeCon talk this past week).

Completion requirements:

This enhancement requires the following artifacts:

Design doc
API change
Docs update

ahg-g · 2024-11-24T03:05:53Z

To support heterogenous setup, we recommend deploying two LWS deployments, and use the group index as a way to link the replicas.

For example, create two LWS named lws-prefill and lws-decode, lws-prefill-0 and lws-decode-0 form the heterogenous replica 0.

supertetelman added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LeaderWorkerSet should support heterogenous resource requirements across Workers #259

LeaderWorkerSet should support heterogenous resource requirements across Workers #259

supertetelman commented Nov 19, 2024 •

edited

Loading

ahg-g commented Nov 24, 2024

LeaderWorkerSet should support heterogenous resource requirements across Workers #259

LeaderWorkerSet should support heterogenous resource requirements across Workers #259

Comments

supertetelman commented Nov 19, 2024 • edited Loading

ahg-g commented Nov 24, 2024

supertetelman commented Nov 19, 2024 •

edited

Loading