changelog : `llama-server` REST API #9291

ggerganov · 2024-09-03T06:56:11Z

Overview

This is a list of changes to the public HTTP interface of the llama-server example. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the master branch.

If you are building a 3rd party project that relies on llama-server, it is recommended to follow this issue and check it carefully before upgrading to new versions.

Recent API changes (most recent at the top)

version	PR	desc
TBD.	#10783	`logprobs` is now OAI-compat, default to pre-sampling probs
TBD.	#10861	`/embeddings` supports pooling type `none`
TBD.	#10853	Add optional `"tokens"` output to `/completions` endpoint
b4337	#10803	Remove `penalize_nl`
b4265	#10626	CPU docker images working directory changed to /app
b4285	#10691	(Again) Change `/slots` and `/props` responses
b4283	#10704	Change `/slots` and `/props` responses
b4027	#10162	`/slots` endpoint: remove `slot[i].state`, add `slot[i].is_processing`
b3912	#9865	Add option to time limit the generation phase
b3911	#9860	Remove self-extend support
b3910	#9857	Remove legacy system prompt support
b3897	#9776	Change default security settings, `/slots` is now disabled by default Endpoints now check for API key if it's set
b3887	#9510	Add `/rerank` endpoint
b3754	#9459	Add `[DONE]\n\n` in OAI stream response to match spec
b3721	#9398	Add `seed_cur` to completion response
b3683	#9308	Environment variable updated
b3599	#9056	Change `/health` and `/slots`

For older changes, use:

git log --oneline -p b3599 -- examples/server/README.md

Upcoming API changes

TBD

The text was updated successfully, but these errors were encountered:

ngxson · 2024-09-07T22:58:40Z

Not a REST API breaking change, but is server-related: some environment variables are changed in #9308

slaren · 2024-09-13T01:15:58Z

After #9398, in the completion response seed contains the seed requested by the user, while seed_cur contains the seed used to generate the completion. The values can be different if seed is LLAMA_DEFAULT_SEED (or -1), in which case a random seed is generated and returned in seed_cur.

ngxson · 2024-10-08T11:27:13Z

Breaking change #9776 : better security control for public deployments

/slots endpoint is now disabled by default, start server with --slots to enable it
If an API key is set, all endpoints (including /slots and /props) requires a correct API key to access.
Note: Only /health and /models are always publicly accessible
Setting "system_prompt" is removed from /completions endpoint. It is now moved to POST /props (see documentation)

Please note that GET /props is always enabled to avoid breaking the web UI.

ngxson · 2024-11-04T15:35:22Z

Breaking change for /slots endpoint #10162

slot[i].state is removed and replaced by slot[i].is_processing

slot[i].is_processing === false means the slot is idle

isaac-mcfadyen · 2024-11-04T23:37:30Z

Breaking change for /slots endpoint #10162

slot[i].state is removed and replaced by slot[i].is_processing

slot[i].is_processing === false means the slot is idle

Was the slots endpoint also disabled by default? (or maybe just a documentation change?)
https://github.com/ggerganov/llama.cpp/pull/10162/files#diff-42ce5869652f266b01a5b5bc95f4d945db304ce54545e2d0c017886a7f1cee1aR698

ngxson · 2024-11-05T10:00:18Z

For security reasons, "/slots" was disabled by default since #9776 , and was mentioned in the breaking changes table. I just forgot to update the docs.

ngxson · 2024-11-07T21:33:44Z

Not an API change, but maybe good to know that the default web UI for llama-server changed in #10175

If you want to use the old completion UI, please follow instruction in the PR.

ggerganov · 2024-11-25T19:51:10Z

cache_prompt: true is now used by default (#10501)

ngxson · 2024-12-07T19:18:33Z

/propsand /slots endpoints has changed in #10691 and #10704 , see server/README.md for more

ngxson · 2024-12-18T11:24:23Z

/embeddings will NOT be OAI-compat after #10861

For clarification, we will maintain OAI-compat for all API under /v1 prefix, including:

/v1/embeddings
/v1/chat/completions

NOTE: OAI support for /v1/completions will come in the near future

ngxson · 2024-12-19T14:41:03Z

Behavior of n_probs has changed in #10783 , we're now providing OAI-compatible logprobs option

ggerganov added the documentation Improvements or additions to documentation label Sep 3, 2024

ggerganov pinned this issue Sep 3, 2024

ggerganov mentioned this issue Sep 3, 2024

changelog : libllama API #9289

Open

countzero mentioned this issue Oct 10, 2024

Add support for API-Key protected /slots endpoint introduced in llama.cpp b3898 distantmagic/paddler#23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changelog : `llama-server` REST API #9291

changelog : `llama-server` REST API #9291

ggerganov commented Sep 3, 2024 •

edited by ngxson

Loading

ngxson commented Sep 7, 2024

slaren commented Sep 13, 2024

ngxson commented Oct 8, 2024

ngxson commented Nov 4, 2024

isaac-mcfadyen commented Nov 4, 2024

ngxson commented Nov 5, 2024

ngxson commented Nov 7, 2024

ggerganov commented Nov 25, 2024

ngxson commented Dec 7, 2024 •

edited

Loading

ngxson commented Dec 18, 2024

ngxson commented Dec 19, 2024

changelog : llama-server REST API #9291

changelog : llama-server REST API #9291

Comments

ggerganov commented Sep 3, 2024 • edited by ngxson Loading

Overview

Recent API changes (most recent at the top)

Upcoming API changes

ngxson commented Sep 7, 2024

slaren commented Sep 13, 2024

ngxson commented Oct 8, 2024

ngxson commented Nov 4, 2024

isaac-mcfadyen commented Nov 4, 2024

ngxson commented Nov 5, 2024

ngxson commented Nov 7, 2024

ggerganov commented Nov 25, 2024

ngxson commented Dec 7, 2024 • edited Loading

ngxson commented Dec 18, 2024

ngxson commented Dec 19, 2024

changelog : `llama-server` REST API #9291

changelog : `llama-server` REST API #9291

ggerganov commented Sep 3, 2024 •

edited by ngxson

Loading

ngxson commented Dec 7, 2024 •

edited

Loading