Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LeaderWorkerSet should support heterogenous resource requirements across Workers #259

Open
3 tasks
supertetelman opened this issue Nov 19, 2024 · 1 comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@supertetelman
Copy link

supertetelman commented Nov 19, 2024

What would you like to be added:
LeaderWorkerSet should support heterogenous resource requirements across Workers.

Why is this needed:
In the use case of disaggregated serving there may be a single top level inference server controller that is orchestrating work across the various WorkerSets. It is likely that the workers doing one type of work be less resource intensive than the workers doing another type of work.

For example, in the case of LLM workloads on GPUs it is likely that the prefill/context workers will require less overall GPU memory and possibly a lower count of GPUs or smaller type of GPUs than the workers doing generation.

In this use case I could think of at least 1 inference server that is designed to properly orchestrate this work and would not fit well into the paradigm of 1 LWS per resource spec (which was discussed as the current design pattern during the LWS KubeCon talk this past week).

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update
@supertetelman supertetelman added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 19, 2024
@ahg-g
Copy link
Contributor

ahg-g commented Nov 24, 2024

To support heterogenous setup, we recommend deploying two LWS deployments, and use the group index as a way to link the replicas.

For example, create two LWS named lws-prefill and lws-decode, lws-prefill-0 and lws-decode-0 form the heterogenous replica 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants