-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: (web UI) Add custom chat formatting that uses input_prefix
and input_suffix
#10425
server: (web UI) Add custom chat formatting that uses input_prefix
and input_suffix
#10425
Conversation
* llama, common, server * main handles prefix and suffix separately, only adjust for example display
* `enable_chat_template` is no used by `server`
…gotHATE/llama.cpp-greedy-rework into server-chat-templates-custom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a quite simple feature to add, so I prefer having less code for it. Also it's better if we can make it more isolated. For example, we could only implement this logic in server.cpp instead of common.cpp
src/llama.cpp
Outdated
@@ -22107,6 +22119,8 @@ static int32_t llama_chat_apply_template_internal( | |||
int32_t llama_chat_apply_template( | |||
const struct llama_model * model, | |||
const char * tmpl, | |||
const char * prefix, | |||
const char * suffix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to change llama_chat_apply_template
. This feature can be implemented outside of llama.h library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it's better to implement it in utils.hpp
then? A separate formatting function that is then used in format_chat
function? Sorry for questions, I'm not quite familiar with server
's structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you can add it in server/utils.hpp
, ideally as a branch inside format_chat
(or can split into its own function if needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, already did that. I will see if all checks are good and add it into UI.
Looks like it will have to be a list of chat templates, in the end - with a choice of "custom" as this new simple solution.
"custom"
chat template that uses input_prefix
and input_suffix
"custom"
chat template that uses input_prefix
and input_suffix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw I think you may need to add kinda prompt_prefix
too. A template like this may not be possible without using a "global" prefix:
You are a helpful assistant
### Question:...
### Response:...
examples/server/utils.hpp
Outdated
@@ -325,10 +326,16 @@ inline std::string format_chat(const struct llama_model * model, const std::stri | |||
throw std::runtime_error("Missing 'content' (ref: https://github.com/ggerganov/llama.cpp/issues/8367)"); | |||
} | |||
|
|||
chat.push_back({role, content}); | |||
if (tmpl == "custom") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do this too?
if (tmpl == "custom") { | |
bool is_custom = !prefix.empty() || !suffix.empty(); | |
if (is_custom) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this - using "custom" as a defined template name helps with selection in UI too. Is there a reason against it that I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UI logic and underlay logic should be decoupled. If you have "custom"
here, it become redundant because you already had bool is_custom = !prefix.empty() || !suffix.empty()
, so kinda 2 variables control the same state.
examples/server/utils.hpp
Outdated
} | ||
|
||
const auto formatted_chat = common_chat_apply_template(model, tmpl, chat, true); | ||
if (tmpl != "custom") formatted_chat = common_chat_apply_template(model, tmpl, chat, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then:
if (tmpl != "custom") formatted_chat = common_chat_apply_template(model, tmpl, chat, true); | |
if (!is_custom) formatted_chat = common_chat_apply_template(model, tmpl, chat, true); |
examples/server/utils.hpp
Outdated
@@ -300,8 +300,9 @@ static llama_tokens format_infill( | |||
} | |||
|
|||
// Format given chat. If tmpl is empty, we take the template from model metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if we can leave a comment on how and when prefix/suffix is used
@ngxson I cannot figure out why prefix and suffix are not passed from UI to server. I tried storing |
input_prefix/suffix are arguments, not parameters. On second thought, I don't think this feature is well-planned and can be quite messy to maintain in the future. Problems are:
So adding this feature for now is kinda half-working, make me feel not very comfortable, especially when not many users actually use it. Also, we're discussing the possibility of adding a minimal jinja support in #9639 So overall, my conclusion is that we should wait a bit more until we have more visibility on this subject. |
I tried to add parameters that copy values from these arguments on the initial loading, but it didn't work either. Seems like the problem is that
Both are expected from a simple chat format. For example, Mistral models work perfectly fine with this setup, even though it doesn't align with the official template.
Yes, I saw that PR, but it's a completely different level of solution, much more complex and will solve issues with formatting overall. For the time being, I will try simplifying this one - without the list of templates, but at least with prefix/suffix. Edit: nevermind, I didn't think server enough, |
* removed "custom" template * fixed reading template, prefix and suffix from payload * removed `chat_template` from UI
* if payload contains empty prefix or suffix, use arguments instead
@ngxson I have simplified the logics according to your recommendations. The only thing is that I've left So far it should be quite simple to use: if Prefix and Suffix input field are empty, either console arguments I saw your PR and noticed that you used per-message prefix and suffix reading - this is mainly for system prompt, right? I'm not sure how to place it into UI - unless a full formatting is used (like it was in previous version of server UI). Would it be better? |
* if no `chat_template` is passed, we can rely on `common_chat_apply_template` function
"custom"
chat template that uses input_prefix
and input_suffix
input_prefix
and input_suffix
No it's not just that. If you take a real-world example, you will see why. Suppose you have a formatted chat: Then what is your prefix/suffix? One proposal would be:
So now how do you format correctly the first message? If you apply the As said, I think the current proposal is not very well-planned (i.e. you don't give any examples, or to test how it really works / what are cases that user may need), so I don't really see the benefit of merging this feature. My proposal #10438 will be much more flexible and cover all use cases possible (as a bonus, even the "thinking" token used by recent "reflection" model can be supported) |
I agree, but that example doesn't need prefix/suffix system - in fact, it would benefit from a fully customized template. I'm closing this PR then - and I can either try continuing from your PR or wait for your version of UI additions. |
This PR adds a new simple chat template that makes use of
input_prefix
andinput_suffix
to format messages. This is enough to be able to work with custom templates on server and is a more inline solution as opposed to changing .js parts of server (but also a less flexible one).Currently,
input_prefix
andinput_suffix
are not used in server at all, andmain
uses both in its own way. As such, we can reuse prefix and suffix for server only without extra steps. It would also make adding template customization into UI quite easy (will add a bit later in this PR).What this currently does is:
"custom"
template definition intoserver
, automatically selected if no template is specified, butinput_prefix
andinput_suffix
are defined.input_prefix
andinput_suffix
informat_chat
(utils.hpp
used byserver
), where they are used to simply format messages from json payload.It is worth mentioning that
main
doesn't always apply prefix/suffix correctly (for example, with "chatml" format), so maybe it is worth adding prefix/suffix manually into each existing prompt format. I can do that later in a separate PR. @ngxson @slaren