Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples : add configuration presets #10932

Open
1 of 6 tasks
ggerganov opened this issue Dec 21, 2024 · 2 comments
Open
1 of 6 tasks

examples : add configuration presets #10932

ggerganov opened this issue Dec 21, 2024 · 2 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request examples good first issue Good for newcomers help wanted Extra attention is needed

Comments

@ggerganov
Copy link
Owner

ggerganov commented Dec 21, 2024

Description

I was recently looking for ways to demonstrate some of the functionality of the llama.cpp examples and some of the commands can become very cumbersome. For example, here is what I use for the llama.vim FIM server:

llama-server \
    -m ./models/qwen2.5-7b-coder/ggml-model-q8_0.gguf \
    --log-file ./service-vim.log \
    --host 0.0.0.0 --port 8012 \
    --ctx-size 0 \
    --cache-reuse 256 \
    -ub 1024 -b 1024 -ngl 99 -fa -dt 0.1

It would be much cleaner if I could just run, for example:

llama-server --cfg-fim-7b

Or if I could turn this embedding server command into something simpler:

# llama-server \
#     --hf-repo ggml-org/bert-base-uncased \
#     --hf-file          bert-base-uncased-Q8_0.gguf \
#     --port 8033 -c 512 --embeddings --pooling mean

llama-server --cfg-embd-bert --port 8033

Implementation

There is already an initial example of how we can create such configuration presets:

llama-tts --tts-oute-default -p "This is a TTS preset"

# equivalent to
# 
# llama-tts \
#    --hf-repo   OuteAI/OuteTTS-0.2-500M-GGUF \
#    --hf-file          OuteTTS-0.2-500M-Q8_0.gguf \
#    --hf-repo-v ggml-org/WavTokenizer \
#    --hf-file-v          WavTokenizer-Large-75-F16.gguf -p "This is a TTS preset"

llama.cpp/common/arg.cpp

Lines 2208 to 2220 in 5cd85b5

// model-specific
add_opt(common_arg(
{"--tts-oute-default"},
string_format("use default OuteTTS models (note: can download weights from the internet)"),
[](common_params & params) {
params.hf_repo = "OuteAI/OuteTTS-0.2-500M-GGUF";
params.hf_file = "OuteTTS-0.2-500M-Q8_0.gguf";
params.vocoder.hf_repo = "ggml-org/WavTokenizer";
params.vocoder.hf_file = "WavTokenizer-Large-75-F16.gguf";
}
).set_examples({LLAMA_EXAMPLE_TTS}));

This preset configures the model urls so that they would be automatically downloaded from HF when the example runs and thus simplifies the command significantly. It can additionally set various default values, such as context size, batch size, pooling type, etc.

Goal

The goal of this issue is to create such presets for various common tasks:

  • Run a basic TTS generation (see above)
  • Start a chat server with a commonly used model
  • Start a speculative-decoding-enabled chat server with a commonly used model
  • Start a FIM server for plugins such as llama.vim
  • Start an embedding server with a commonly used embedding model
  • Start a reranking server with a commonly used reranking model
  • And many more ..

The list of configuration presets would require curation and proper documentation.

I think this is a great task for new contributors to help and to get involved in the project.

@ggerganov ggerganov added documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers examples labels Dec 21, 2024
@ggerganov ggerganov pinned this issue Dec 21, 2024
@sramichetty20019
Copy link

sramichetty20019 commented Dec 22, 2024

Hi! I'm interested in contributing to this issue as a first-time contributor I'd like to work on implementing the chat server preset for commonly used models.

@ngxson
Copy link
Collaborator

ngxson commented Dec 22, 2024

IMO having a flag --preset can be much more intuitive for most users. For example:

llama-server --preset qwen-fim-7b
llama-server --preset embd-bert
...

Or we can even introduce positional parameters (I steal the idea from ollama):

llama-server launch qwen-fim-7b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request examples good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants