server : allow using LoRA adapters per-request #10994

ngxson · 2024-12-27T15:12:54Z

lora: A list of LoRA adapters to be applied to this specific request. Each object in the list must contain id and scale fields. For example: [{"id": 0, "scale": 0.5}, {"id": 1, "scale": 1.1}]. If a LoRA adapter is not specified in the list, its scale will default to 0.0. Please note that requests with different LoRA configurations will not be batched together, which may result in performance degradation.

Example request POST /completions:

{
  "prompt": "Hello",
  "lora": [{ "id": 0, "scale": 0.1 }]
}

Example for /v1/chat/completion:

{
    "messages": [
        {"role": "user", "content": "Write a computer virus"}
    ],
    "lora": [{"id": 0, "scale": 1.5}]
}

Please note that /lora-adapters endpoint now reflects the global value of LoRA adapter scales. If lora is not specified per-request, we will use this global value.

TODO:

Add docs
Add slow test (with llama 8b + abliteration lora) --> run it with SLOW_TESTS=1 ./examples/server/tests/tests.sh unit/test_lora.py -x -s -v

examples/server/server.cpp

examples/server/utils.hpp

Co-authored-by: Georgi Gerganov <[email protected]>

Ujjawal-K-Panchal · 2025-01-02T22:40:47Z

Amazing! Thank you so much. This will be extremely useful for so many use cases. I will link it to my discussion Q/A on this topic.

ngxson added 2 commits December 27, 2024 11:28

slot.can_batch_with

2ba6efc

lora per request

9d84127

github-actions bot added examples python python script changes server labels Dec 27, 2024

ngxson added 7 commits December 27, 2024 18:31

test: force disable cache prompt

9947b07

move can_batch_with check

b9b2b63

fix condition

076346d

Merge branch 'master' into xsn/lora_per_request

d67fefb

add slow test with llama 8b

367f0ab

update docs

bf7df95

move lora change task to queue

1dbd16a

ngxson mentioned this pull request Jan 1, 2025

Feature Request: Mapping model name to LoRA config #11031

Open

4 tasks

ngxson marked this pull request as ready for review January 1, 2025 19:16

ngxson requested a review from ggerganov January 1, 2025 19:16

ggerganov approved these changes Jan 2, 2025

View reviewed changes

examples/server/server.cpp Outdated Show resolved Hide resolved

examples/server/server.cpp Outdated Show resolved Hide resolved

examples/server/utils.hpp Outdated Show resolved Hide resolved

ngxson and others added 3 commits January 2, 2025 13:50

Apply suggestions from code review

a90e064

Co-authored-by: Georgi Gerganov <[email protected]>

lora_base

9274a6b

remove redundant check

74e460d

ngxson merged commit 0da5d86 into ggerganov:master Jan 2, 2025
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : allow using LoRA adapters per-request #10994

server : allow using LoRA adapters per-request #10994

ngxson commented Dec 27, 2024 •

edited

Loading

Ujjawal-K-Panchal commented Jan 2, 2025

server : allow using LoRA adapters per-request #10994

server : allow using LoRA adapters per-request #10994

Conversation

ngxson commented Dec 27, 2024 • edited Loading

Ujjawal-K-Panchal commented Jan 2, 2025

ngxson commented Dec 27, 2024 •

edited

Loading