Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : add OAI compat for /v1/completions #10974

Merged
merged 4 commits into from
Dec 31, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 25, 2024

Supersede #10645

Ref documentation: https://platform.openai.com/docs/api-reference/completions/object

The /v1/completions endpoint can now be OAI-compatible (not to be confused with /completion endpoint, without /v1 prefix)

Also regrouped the docs to have 2 dedicated sections: one for OAI-compat API and one for non-OAI API

TODO:

  • add test
  • add docs

@github-actions github-actions bot added the python python script changes label Dec 25, 2024
@ngxson ngxson added the breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. label Dec 25, 2024
@ngxson ngxson marked this pull request as ready for review December 25, 2024 16:05
@ngxson ngxson requested a review from ggerganov December 25, 2024 16:05
@ericcurtin
Copy link
Contributor

Would this make llama-server compatible with this client?

https://github.com/open-webui/open-webui

if yes can we please get this in? 😄

I'm also curious for anyone in the know, it seems like a lot of the openai clients (like open-webui) expect the functionality of being able to switch models per request. Does llama-server support this and if not, what would be the effort to add that roughly?

@ggerganov
Copy link
Owner

Does llama-server support this and if not, what would be the effort to add that roughly?

This is not supported atm. But this logic seems like something more suitable for a proxy/routing layer rather than implementing it in llama-server.

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 31, 2024

@ericcurtin I have no idea if they support 3rd party openai-compatible server or not. Judging from they README, they kinda support it via :ollama docker image tag, but I'm not sure if that means "image with ollama built-in" or "bring your own ollama server"

In either case, I think they rely on /v1/chat/completions, which we already have in llama.cpp. So it's not related to the current PR.

@ngxson ngxson merged commit 5896c65 into ggerganov:master Dec 31, 2024
50 checks passed
@mostlygeek
Copy link
Contributor

This is not supported atm. But this logic seems like something more suitable for a proxy/routing layer rather than implementing it in llama-server.

I wrote llama-swap for just this purpose. It’s a transparent proxy that will swap llama-server based on the model name in the api call. It’s a single golang binary with no dependencies so it is easy to deploy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. examples python python script changes server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants