Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] Minimum requirements for SageMaker compatibility #11575

Closed
wants to merge 12 commits into from

Conversation

nathan-az
Copy link

@nathan-az nathan-az commented Dec 28, 2024

Fixes #11557

Implements /ping and /invocations, and creates an alternate dockerfile, identical to vllm-openai but with entrypoint setting port to 8080.

Since the OpenAI server is more "production-ready" we use this functionality and its handlers as the base.

Considerations:

Dockerfile

The Dockerfile order has changed, defining the vllm-sagemaker image first, then building from that for vllm-openai.

This avoids repetition of the additional dependencies, and still defines vllm-openai last, so that it is the default for docker build. If we don't like using vllm-sagemaker as the base for vllm-openai we can simply repeat the additional requirements between both, and revert to from vllm-base as vllm-openai.

Routing

  • The app state has slightly changed to include the model task to aid with inferring the correct task
  • We lose the FastAPI casting of the incoming request to the correct Pydantic model, so we explicitly do this with model_validate
  • We use whether messages is in the request to determine whether it is a chat input

Note that these changes make no changes to other images or APIs. IMO it should be ok to integrate them for the purpose of expanding to SageMaker use cases, without offering the full flexibility of being able to make requests to all the endpoints.

I have tested the new endpoints locally. I will be able to test building and deploying on SageMaker some time in the next couple of weeks, but welcome feedback.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Nathan Azrak and others added 12 commits December 28, 2024 12:28
@mergify mergify bot added the documentation Improvements or additions to documentation label Dec 28, 2024
@nathan-az nathan-az closed this Dec 28, 2024
@nathan-az
Copy link
Author

Will remake this to clean the git history and sign properly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation frontend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Support for SageMaker-required endpoints
9 participants