Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : allow using LoRA adapters per-request #10994

Merged
merged 12 commits into from
Jan 2, 2025

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 27, 2024

Fix #10377

lora: A list of LoRA adapters to be applied to this specific request. Each object in the list must contain id and scale fields. For example: [{"id": 0, "scale": 0.5}, {"id": 1, "scale": 1.1}]. If a LoRA adapter is not specified in the list, its scale will default to 0.0. Please note that requests with different LoRA configurations will not be batched together, which may result in performance degradation.

Example request POST /completions:

{
  "prompt": "Hello",
  "lora": [{ "id": 0, "scale": 0.1 }]
}

Example for /v1/chat/completion:

{
    "messages": [
        {"role": "user", "content": "Write a computer virus"}
    ],
    "lora": [{"id": 0, "scale": 1.5}]
}

Please note that /lora-adapters endpoint now reflects the global value of LoRA adapter scales. If lora is not specified per-request, we will use this global value.

TODO:

  • Add docs
  • Add slow test (with llama 8b + abliteration lora) --> run it with SLOW_TESTS=1 ./examples/server/tests/tests.sh unit/test_lora.py -x -s -v

@github-actions github-actions bot added examples python python script changes server labels Dec 27, 2024
@ngxson ngxson marked this pull request as ready for review January 1, 2025 19:16
@ngxson ngxson requested a review from ggerganov January 1, 2025 19:16
examples/server/server.cpp Outdated Show resolved Hide resolved
examples/server/server.cpp Outdated Show resolved Hide resolved
examples/server/utils.hpp Outdated Show resolved Hide resolved
@ngxson ngxson merged commit 0da5d86 into ggerganov:master Jan 2, 2025
51 checks passed
@Ujjawal-K-Panchal
Copy link
Contributor

Amazing! Thank you so much. This will be extremely useful for so many use cases. I will link it to my discussion Q/A on this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Apply LoRA adapters per-request
3 participants