server : add OAI compat for /v1/completions #10974

ngxson · 2024-12-25T12:49:09Z

Supersede #10645

Ref documentation: https://platform.openai.com/docs/api-reference/completions/object

The /v1/completions endpoint can now be OAI-compatible (not to be confused with /completion endpoint, without /v1 prefix)

Also regrouped the docs to have 2 dedicated sections: one for OAI-compat API and one for non-OAI API

TODO:

add test
add docs

ericcurtin · 2024-12-28T20:50:15Z

Would this make llama-server compatible with this client?

https://github.com/open-webui/open-webui

if yes can we please get this in? 😄

I'm also curious for anyone in the know, it seems like a lot of the openai clients (like open-webui) expect the functionality of being able to switch models per request. Does llama-server support this and if not, what would be the effort to add that roughly?

ggerganov · 2024-12-31T11:19:36Z

Does llama-server support this and if not, what would be the effort to add that roughly?

This is not supported atm. But this logic seems like something more suitable for a proxy/routing layer rather than implementing it in llama-server.

ngxson · 2024-12-31T11:33:34Z

@ericcurtin I have no idea if they support 3rd party openai-compatible server or not. Judging from they README, they kinda support it via :ollama docker image tag, but I'm not sure if that means "image with ollama built-in" or "bring your own ollama server"

In either case, I think they rely on /v1/chat/completions, which we already have in llama.cpp. So it's not related to the current PR.

mostlygeek · 2024-12-31T19:39:32Z

This is not supported atm. But this logic seems like something more suitable for a proxy/routing layer rather than implementing it in llama-server.

I wrote llama-swap for just this purpose. It’s a transparent proxy that will swap llama-server based on the model name in the api call. It’s a single golang binary with no dependencies so it is easy to deploy.

server : add OAI compat for /v1/completions

90889fd

github-actions bot added examples server labels Dec 25, 2024

add test

3603399

github-actions bot added the python python script changes label Dec 25, 2024

ngxson added 2 commits December 25, 2024 16:51

add docs

bd8e827

better docs

01b2b28

ngxson added the breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. label Dec 25, 2024

ngxson marked this pull request as ready for review December 25, 2024 16:05

ngxson requested a review from ggerganov December 25, 2024 16:05

ggerganov approved these changes Dec 31, 2024

View reviewed changes

ngxson merged commit 5896c65 into ggerganov:master Dec 31, 2024
50 checks passed

ngxson mentioned this pull request Dec 31, 2024

changelog : llama-server REST API #9291

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : add OAI compat for /v1/completions #10974

server : add OAI compat for /v1/completions #10974

ngxson commented Dec 25, 2024 •

edited

Loading

ericcurtin commented Dec 28, 2024

ggerganov commented Dec 31, 2024

ngxson commented Dec 31, 2024 •

edited

Loading

mostlygeek commented Dec 31, 2024

server : add OAI compat for /v1/completions #10974

server : add OAI compat for /v1/completions #10974

Conversation

ngxson commented Dec 25, 2024 • edited Loading

ericcurtin commented Dec 28, 2024

ggerganov commented Dec 31, 2024

ngxson commented Dec 31, 2024 • edited Loading

mostlygeek commented Dec 31, 2024

ngxson commented Dec 25, 2024 •

edited

Loading

ngxson commented Dec 31, 2024 •

edited

Loading