[Misc] Minimum requirements for SageMaker compatibility #11575

nathan-az · 2024-12-28T01:20:51Z

Implements /ping and /invocations, and creates an alternate dockerfile, identical to vllm-openai but with entrypoint setting port to 8080.

Since the OpenAI server is more "production-ready" we use this functionality and its handlers as the base.

Considerations:

Dockerfile

The Dockerfile order has changed, defining the vllm-sagemaker image first, then building from that for vllm-openai.

This avoids repetition of the additional dependencies, and still defines vllm-openai last, so that it is the default for docker build. If we don't like using vllm-sagemaker as the base for vllm-openai we can simply repeat the additional requirements between both, and revert to from vllm-base as vllm-openai.

Routing

The app state has slightly changed to include the model task to aid with inferring the correct task
We lose the FastAPI casting of the incoming request to the correct Pydantic model, so we explicitly do this with model_validate
We use whether messages is in the request to determine whether it is a chat input

Note that these changes make no changes to other images or APIs. IMO it should be ok to integrate them for the purpose of expanding to SageMaker use cases, without offering the full flexibility of being able to make requests to all the endpoints.

I have tested the new endpoints locally. I will be able to test building and deploying on SageMaker some time in the next couple of weeks, but welcome feedback.

github-actions · 2024-12-28T01:21:02Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

…nate docker build Signed-off-by: Nathan Azrak <[email protected]>

Signed-off-by: Alex He <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

Signed-off-by: Jee Jee Li <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: ccjincong <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: Erez Schwartz <[email protected]>

Signed-off-by: Jee Jee Li <[email protected]>

…ghts with same suffix (#11566) Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: Chen Zhang <[email protected]>

Signed-off-by: Nathan Azrak <[email protected]>

nathan-az · 2024-12-28T01:31:04Z

Will remake this to clean the git history and sign properly

mergify bot added ci/build frontend labels Dec 28, 2024

Nathan Azrak and others added 12 commits December 28, 2024 12:28

Implement sagemaker required endpoints /invocations, /ping, and alter…

f9d9b69

…nate docker build Signed-off-by: Nathan Azrak <[email protected]>

Update deploying_with_k8s.md with AMD ROCm GPU example (#11465)

add83f5

Signed-off-by: Alex He <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

[Bugfix] Fix TeleChat2ForCausalLM weights mapper (#11546)

242efca

Signed-off-by: Jee Jee Li <[email protected]>

[Misc] Abstract the logic for reading and writing media content (#11527)

cfae33b

Signed-off-by: DarkLight1337 <[email protected]>

[Doc] Add xgrammar in doc (#11549)

4ee6c6d

Signed-off-by: ccjincong <[email protected]>

[VLM] Support caching in merged multi-modal processor (#11396)

d18d7b0

Signed-off-by: DarkLight1337 <[email protected]>

[MODEL] LoRA support for Jamba model (#11209)

537283b

Signed-off-by: Erez Schwartz <[email protected]>

[Misc]Add BNB quantization for MolmoForCausalLM (#11551)

91442e1

Signed-off-by: Jee Jee Li <[email protected]>

[Misc] Improve BNB loader to handle mixture of sharded and merged wei…

e55e8b3

…ghts with same suffix (#11566) Signed-off-by: Isotr0py <[email protected]>

[Bugfix] Fix for ROCM compressed tensor support (#11561)

34f4404

[Doc] Update mllama example based on official doc (#11567)

e0ffad8

Signed-off-by: Chen Zhang <[email protected]>

Change /ping to post endpoint per AWS documentation

b3b2e36

Signed-off-by: Nathan Azrak <[email protected]>

nathan-az force-pushed the main branch from c9299cd to b3b2e36 Compare December 28, 2024 01:29

nathan-az requested review from DarkLight1337, ywang96, robertgshaw2-neuralmagic and simon-mo as code owners December 28, 2024 01:29

mergify bot added the documentation Improvements or additions to documentation label Dec 28, 2024

nathan-az closed this Dec 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] Minimum requirements for SageMaker compatibility #11575

[Misc] Minimum requirements for SageMaker compatibility #11575

nathan-az commented Dec 28, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 28, 2024

nathan-az commented Dec 28, 2024

[Misc] Minimum requirements for SageMaker compatibility #11575

[Misc] Minimum requirements for SageMaker compatibility #11575

Conversation

nathan-az commented Dec 28, 2024 • edited by github-actions bot Loading

Considerations:

Dockerfile

Routing

github-actions bot commented Dec 28, 2024

nathan-az commented Dec 28, 2024

nathan-az commented Dec 28, 2024 •

edited by github-actions bot

Loading