server: (web UI) Add custom chat formatting that uses `input_prefix` and `input_suffix` #10425

MaggotHATE · 2024-11-20T10:36:13Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

This PR adds a new simple chat template that makes use of input_prefix and input_suffix to format messages. This is enough to be able to work with custom templates on server and is a more inline solution as opposed to changing .js parts of server (but also a less flexible one).

Currently, input_prefix and input_suffix are not used in server at all, and main uses both in its own way. As such, we can reuse prefix and suffix for server only without extra steps. It would also make adding template customization into UI quite easy (will add a bit later in this PR).

What this currently does is:

Adds "custom" template definition into server, automatically selected if no template is specified, but input_prefix and input_suffix are defined.
Passes input_prefix and input_suffix in format_chat (utils.hpp used by server), where they are used to simply format messages from json payload.

It is worth mentioning that main doesn't always apply prefix/suffix correctly (for example, with "chatml" format), so maybe it is worth adding prefix/suffix manually into each existing prompt format. I can do that later in a separate PR. @ngxson @slaren

* llama, common, server * main handles prefix and suffix separately, only adjust for example display

* `enable_chat_template` is no used by `server`

…gotHATE/llama.cpp-greedy-rework into server-chat-templates-custom

ngxson

This is a quite simple feature to add, so I prefer having less code for it. Also it's better if we can make it more isolated. For example, we could only implement this logic in server.cpp instead of common.cpp

ngxson · 2024-11-20T11:07:53Z

src/llama.cpp

@@ -22107,6 +22119,8 @@ static int32_t llama_chat_apply_template_internal(
 int32_t llama_chat_apply_template(
                const struct llama_model * model,
                              const char * tmpl,
+                              const char * prefix,
+                              const char * suffix,


I don't think we need to change llama_chat_apply_template. This feature can be implemented outside of llama.h library

I suppose it's better to implement it in utils.hpp then? A separate formatting function that is then used in format_chat function? Sorry for questions, I'm not quite familiar with server's structure.

Yes you can add it in server/utils.hpp, ideally as a branch inside format_chat (or can split into its own function if needed)

Thank you, already did that. I will see if all checks are good and add it into UI.

Looks like it will have to be a list of chat templates, in the end - with a choice of "custom" as this new simple solution.

common/common.h

ngxson

Btw I think you may need to add kinda prompt_prefix too. A template like this may not be possible without using a "global" prefix:

You are a helpful assistant

### Question:...

### Response:...

ngxson · 2024-11-20T15:03:46Z

examples/server/utils.hpp

@@ -325,10 +326,16 @@ inline std::string format_chat(const struct llama_model * model, const std::stri
            throw std::runtime_error("Missing 'content' (ref: https://github.com/ggerganov/llama.cpp/issues/8367)");
        }

-        chat.push_back({role, content});
+        if (tmpl == "custom") {


Maybe we can do this too?

Suggested change

if (tmpl == "custom") {

bool is_custom = !prefix.empty() || !suffix.empty();

if (is_custom) {

I'm not sure about this - using "custom" as a defined template name helps with selection in UI too. Is there a reason against it that I'm missing?

The UI logic and underlay logic should be decoupled. If you have "custom" here, it become redundant because you already had bool is_custom = !prefix.empty() || !suffix.empty(), so kinda 2 variables control the same state.

ngxson · 2024-11-20T15:04:35Z

examples/server/utils.hpp

    }

-    const auto formatted_chat = common_chat_apply_template(model, tmpl, chat, true);
+    if (tmpl != "custom") formatted_chat = common_chat_apply_template(model, tmpl, chat, true);


Then:

Suggested change

if (tmpl != "custom") formatted_chat = common_chat_apply_template(model, tmpl, chat, true);

if (!is_custom) formatted_chat = common_chat_apply_template(model, tmpl, chat, true);

ngxson · 2024-11-20T15:05:34Z

examples/server/utils.hpp

@@ -300,8 +300,9 @@ static llama_tokens format_infill(
 }

 // Format given chat. If tmpl is empty, we take the template from model metadata


Would be nice if we can leave a comment on how and when prefix/suffix is used

examples/server/server.cpp

MaggotHATE · 2024-11-20T17:56:31Z

@ngxson I cannot figure out why prefix and suffix are not passed from UI to server. I tried storing chat_template, input_prefix and input_suffix locally per slot in slot_params, but it didn't help too (but it's probably a good idea?), so I reverted to original. What am I missing?

ngxson · 2024-11-20T18:19:11Z

@ngxson I cannot figure out why prefix and suffix are not passed from UI to server. I tried storing chat_template, input_prefix and input_suffix locally per slot in slot_params, but it didn't help too (but it's probably a good idea?), so I reverted to original. What am I missing?

input_prefix/suffix are arguments, not parameters.

On second thought, I don't think this feature is well-planned and can be quite messy to maintain in the future. Problems are:

List of chat template is hard-coded into index.html, hard to maintain
This implementation won't work if the list of messages does not strictly follow order of user-assistant-user-assistant...
System message is currently not allowed

So adding this feature for now is kinda half-working, make me feel not very comfortable, especially when not many users actually use it.

Also, we're discussing the possibility of adding a minimal jinja support in #9639

So overall, my conclusion is that we should wait a bit more until we have more visibility on this subject.

MaggotHATE · 2024-11-20T18:32:30Z

input_prefix/suffix are arguments, not parameters.

I tried to add parameters that copy values from these arguments on the initial loading, but it didn't work either. Seems like the problem is that launch_slot_with_task happens after handle_chat_completions, and no related parameters will be loaded for formatting anyway.

This implementation won't work if the list of messages does not strictly follow order of user-assistant-user-assistant...

System message is currently not allowed

Both are expected from a simple chat format. For example, Mistral models work perfectly fine with this setup, even though it doesn't align with the official template.

Also, we're discussing the possibility of adding a minimal jinja support

Yes, I saw that PR, but it's a completely different level of solution, much more complex and will solve issues with formatting overall. For the time being, I will try simplifying this one - without the list of templates, but at least with prefix/suffix.

Edit: nevermind, I didn't think server enough, oaicompat_completion_params_parse or format_chat need to read prefix and suffix from the payload.

* removed "custom" template * fixed reading template, prefix and suffix from payload * removed `chat_template` from UI

* if payload contains empty prefix or suffix, use arguments instead

MaggotHATE · 2024-11-21T17:33:55Z

@ngxson I have simplified the logics according to your recommendations. The only thing is that I've left chat_template as a parameter that can be passed to server (and tested it before removing templates list from UI).

So far it should be quite simple to use: if Prefix and Suffix input field are empty, either console arguments input_prefix and input_suffix, or template will be used. If prefix and suffix are not empty, they will give custom formatting.

I saw your PR and noticed that you used per-message prefix and suffix reading - this is mainly for system prompt, right? I'm not sure how to place it into UI - unless a full formatting is used (like it was in previous version of server UI). Would it be better?

* if no `chat_template` is passed, we can rely on `common_chat_apply_template` function

ngxson · 2024-11-24T13:01:02Z

I saw your PR and noticed that you used per-message prefix and suffix reading - this is mainly for system prompt, right? I'm not sure how to place it into UI - unless a full formatting is used (like it was in previous version of server UI). Would it be better?

No it's not just that. If you take a real-world example, you will see why.

Suppose you have a formatted chat: <user>hello</user><assistant>how are you</assistant><user>who are you</user><assistant>

Then what is your prefix/suffix?

One proposal would be:

prefix: </assistant><user>
suffix: </user><assistant>

So now how do you format correctly the first message? If you apply the {prefix}{message}{suffix} logic for user message, then now it becomes </assistant><user>hello</user>

As said, I think the current proposal is not very well-planned (i.e. you don't give any examples, or to test how it really works / what are cases that user may need), so I don't really see the benefit of merging this feature. My proposal #10438 will be much more flexible and cover all use cases possible (as a bonus, even the "thinking" token used by recent "reflection" model can be supported)

MaggotHATE · 2024-11-24T13:32:31Z

My proposal #10438 will be much more flexible and cover all use cases possible (as a bonus, even the "thinking" token used by recent "reflection" model can be supported)

I agree, but that example doesn't need prefix/suffix system - in fact, it would benefit from a fully customized template. I'm closing this PR then - and I can either try continuing from your PR or wait for your version of UI additions.

MaggotHATE added 2 commits November 20, 2024 14:42

Initial "prefix+suffix" chat template

53e0215

* llama, common, server * main handles prefix and suffix separately, only adjust for example display

Merge branch 'ggerganov:master' into server-chat-templates-custom

84da80c

github-actions bot added examples server labels Nov 20, 2024

MaggotHATE added 4 commits November 20, 2024 15:40

Fix trailing whitespace, reverted enable_chat_template in arg

9b58edf

* `enable_chat_template` is no used by `server`

Merge branch 'server-chat-templates-custom' of https://github.com/Mag…

dbe531e

…gotHATE/llama.cpp-greedy-rework into server-chat-templates-custom

Fix for simple-chat

b3e343e

Fixed test-chat-template

fc05038

ngxson reviewed Nov 20, 2024

View reviewed changes

Reverted to a simple solution withing server only.

ec6212e

MaggotHATE changed the title ~~llama, common, server: Add "custom" chat template that uses input_prefix and input_suffix~~ server: Add "custom" chat template that uses input_prefix and input_suffix Nov 20, 2024

ngxson reviewed Nov 20, 2024

View reviewed changes

Preliminary work for UI and logging

a0e27c1

ngxson mentioned this pull request Nov 20, 2024

server : (proposal) allow user to customize chat format #10438

Draft

2 tasks

MaggotHATE added 4 commits November 21, 2024 21:55

Simplified logics and UI

73f435f

* removed "custom" template * fixed reading template, prefix and suffix from payload * removed `chat_template` from UI

Cleanup of unused features

2f0a014

Merge branch 'ggerganov:master' into server-chat-templates-custom

3376137

Fixed prefix and suffix compatibility

b2cf6e7

* if payload contains empty prefix or suffix, use arguments instead

MaggotHATE added 2 commits November 21, 2024 22:46

Simplify logics even further

d92f518

* if no `chat_template` is passed, we can rely on `common_chat_apply_template` function

Merge branch 'ggerganov:master' into server-chat-templates-custom

ecd8ac9

MaggotHATE changed the title ~~server: Add "custom" chat template that uses input_prefix and input_suffix~~ server: (web UI) Add custom chat formatting that uses input_prefix and input_suffix Nov 22, 2024

MaggotHATE marked this pull request as ready for review November 22, 2024 06:21

Merge branch 'ggerganov:master' into server-chat-templates-custom

dd2bc32

MaggotHATE closed this Nov 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: (web UI) Add custom chat formatting that uses `input_prefix` and `input_suffix` #10425

server: (web UI) Add custom chat formatting that uses `input_prefix` and `input_suffix` #10425

MaggotHATE commented Nov 20, 2024 •

edited

Loading

ngxson left a comment

ngxson Nov 20, 2024

MaggotHATE Nov 20, 2024

ngxson Nov 20, 2024

MaggotHATE Nov 20, 2024 •

edited

Loading

ngxson left a comment

ngxson Nov 20, 2024

MaggotHATE Nov 20, 2024

ngxson Nov 20, 2024

ngxson Nov 20, 2024

ngxson Nov 20, 2024

MaggotHATE commented Nov 20, 2024

ngxson commented Nov 20, 2024 •

edited

Loading

MaggotHATE commented Nov 20, 2024 •

edited

Loading

MaggotHATE commented Nov 21, 2024

ngxson commented Nov 24, 2024

MaggotHATE commented Nov 24, 2024 •

edited

Loading

	if (tmpl == "custom") {
	bool is_custom = !prefix.empty() \|\| !suffix.empty();
	if (is_custom) {

	if (tmpl != "custom") formatted_chat = common_chat_apply_template(model, tmpl, chat, true);
	if (!is_custom) formatted_chat = common_chat_apply_template(model, tmpl, chat, true);

		@@ -300,8 +300,9 @@ static llama_tokens format_infill(
		}

		// Format given chat. If tmpl is empty, we take the template from model metadata

server: (web UI) Add custom chat formatting that uses input_prefix and input_suffix #10425

server: (web UI) Add custom chat formatting that uses input_prefix and input_suffix #10425

Conversation

MaggotHATE commented Nov 20, 2024 • edited Loading

ngxson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaggotHATE Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

ngxson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaggotHATE commented Nov 20, 2024

ngxson commented Nov 20, 2024 • edited Loading

MaggotHATE commented Nov 20, 2024 • edited Loading

MaggotHATE commented Nov 21, 2024

ngxson commented Nov 24, 2024

MaggotHATE commented Nov 24, 2024 • edited Loading

server: (web UI) Add custom chat formatting that uses `input_prefix` and `input_suffix` #10425

server: (web UI) Add custom chat formatting that uses `input_prefix` and `input_suffix` #10425

MaggotHATE commented Nov 20, 2024 •

edited

Loading

MaggotHATE Nov 20, 2024 •

edited

Loading

ngxson commented Nov 20, 2024 •

edited

Loading

MaggotHATE commented Nov 20, 2024 •

edited

Loading

MaggotHATE commented Nov 24, 2024 •

edited

Loading