Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chat template for RWKV-World #9968

Merged
merged 4 commits into from
Oct 22, 2024

Conversation

MollySophia
Copy link
Contributor

@MollySophia MollySophia commented Oct 21, 2024

  • Add a chat-template entry named "rwkv-world" for rwkv gguf models. This makes it convenient for downstream projects like ollama to decide which template to use.
  • Add a simple chat template that matches official chat usage.

Edit:

Edit2:

  • Added rwkv-6 into supported model list in README.md, which has been forgotten for some time :D

  • I have read the contributing guidelines

  • Self-reported review complexity:

    • Low

@github-actions github-actions bot added testing Everything test related python python script changes labels Oct 21, 2024
@MollySophia MollySophia marked this pull request as draft October 21, 2024 07:10
@MollySophia
Copy link
Contributor Author

Sorry for converting this PR to draft after it's been approved @ggerganov
There's another problem that I want to address at the same time.
RWKV hf models specify eos_token = 0, but when chatting with these models they don't use eos to end the turn, but rather use another token.
After digging into the code, it seems that setting the eot token would make it working?

@ggerganov
Copy link
Owner

Simply exporting the EOS token through the GGUF KV meta data should be enough. Can you point me to a model on HF so I can take a look at the meta data and see what it is missing?

@MollySophia
Copy link
Contributor Author

MollySophia commented Oct 21, 2024

Simply exporting the EOS token through the GGUF KV meta data should be enough.

The model doesn't use EOS token when chatting.
#9892 Here's a related issue.
I've tried setting EOT to the correct one and the problem goes away.

Can you point me to a model on HF so I can take a look at the meta data and see what it is missing?

https://huggingface.co/RWKV/rwkv-6-world-1b6
Thanks!

@MollySophia MollySophia marked this pull request as ready for review October 21, 2024 08:31
@ggerganov ggerganov merged commit 4ff7fe1 into ggerganov:master Oct 22, 2024
54 checks passed
@ggerganov
Copy link
Owner

ggerganov commented Oct 22, 2024

I think that old RWKV GGUF models would have to be re-generated after this change.

@MollySophia
Copy link
Contributor Author

I think that old RWKV GGUF models would have to be re-generated after this change.

Sure, I'll make a contact with people related.

@ngxson
Copy link
Collaborator

ngxson commented Oct 22, 2024

To be honest, I wouldn't accept models not having a EOS/EOT token token in llama_chat_apply_template, as this is very confusing and frustrating for end-user.

This template does not have any value for now. It will not compatible with llama-cli or llama-server, as they requires EOS/EOT to stop the generation - you have been warned.

@MollySophia
Copy link
Contributor Author

To be honest, I wouldn't accept models not having a EOS/EOT token token in llama_chat_apply_template, as this is very confusing and frustrating for end-user.

This template does not have any value for now, as it will not compatible with llama-cli or llama-server, as they requires EOS/EOT to stop the generation - you have been warned.

I think a EOT token has been added in this PR. This should be okay I guess?

@ngxson
Copy link
Collaborator

ngxson commented Oct 22, 2024

Hmm ok adding \n\n is a quite hacky solution IMO. This will effectively disallow the model to produce any double new line (useful when generating markdown content, I guess?)

Still not convinced that this is a good way to do, up to you to decide @ggerganov

In anyway, I think llama_chat_apply_template should only support templates having special tokens. It must not reuse non-special tokens used in text generation.

@MollySophia
Copy link
Contributor Author

Hmm ok adding \n\n is a quite hacky solution IMO. This will effectively disallow the model to produce any double new line (useful when generating markdown content, I guess?)

Yes, that makes sense. I've discussed with @BlinkDL about this before. It seems that it's better train the model with special tokens like <|im_start|> in the future. However, current pretrained models tend to use '\n\n' to end a turn of chat. This hack is currently my best solution to this :P

@ngxson
Copy link
Collaborator

ngxson commented Oct 22, 2024

Yeah I mean it's better not to add these kind of template to llama_chat_apply_template. Because once added, we rarely remove a template from our list.

Also as this is merged, now some models using this same kind of template, but without proper EOT/EOS, will also be marked as "supported"

@MollySophia
Copy link
Contributor Author

Yeah I mean it's better not to add these kind of template to llama_chat_apply_template. Because once added, we rarely remove a template from our list.

Also as this is merged, now some models using this same kind of template, but without proper EOT/EOS, will also be marked as "supported"

That's a good point too. The matching method for this template is a bit not unique enough. Then I think it's better only keep tmpl == "rwkv-world" || tmpl_contains("rwkv-world") since only rwkv models set this string for tokenizer.chat_template?

@ngxson
Copy link
Collaborator

ngxson commented Oct 22, 2024

Yes it would be better to leave only tmpl_contains("rwkv-world"). Please also add a comment to note that this model has specific EOT \n\n in order to make it work, so future contributors won't try to copy the same thing without noticing that there is no EOT in their model.

Btw, do you have an example of gguf having correct template? I tried this tokenizer_config.json and this gguf but cannot find the template anywhere.

@MollySophia
Copy link
Contributor Author

Yes I would be better to leave only tmpl_contains("rwkv-world").

Okay, I'll open up another PR for this.

Please also add a comment to note that this model has specific EOT \n\n in order to make it work, so future contributor won't try to copy the same thing without noticing that there is not EOT.

May I ask where should this be put?

Btw, do you have an example of gguf having correct template? I tried this tokenizer_config.json

AFAIK there isn't. There's a chatting example on the hf model page, and the code here is written corresponding to that.

and this gguf but cannot find the template anywhere.

These gguf model are not re-generated yet. For the tokenizer.chat_template metadata in gguf, the code in this PR simply sets it to the string rwkv-world for identification.

@MollySophia
Copy link
Contributor Author

AFAIK there isn't. There's a chatting example on the hf model page, and the code here is written corresponding to that.

Yeah I just realized, that since there's no existing template here, the matching of template string content has no value XD
Thanks a lot for your help!

@MollySophia MollySophia mentioned this pull request Oct 22, 2024
2 tasks
for (auto message : chat) {
std::string role(message->role);
if (role == "user") {
ss << "User: " << message->content << "\n\nAssistant:";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we need to add a space after Assistant: ? We currently have a space after User:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I knew, it's not required according to @BlinkDL 's demo code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
* Add chat template for RWKV-World

Signed-off-by: Molly Sophia <[email protected]>

* RWKV: Fix the chat template not being used

Signed-off-by: Molly Sophia <[email protected]>

* RWKV v6: Set EOT token to ``\n\n``

Signed-off-by: Molly Sophia <[email protected]>

* readme: add rwkv into supported model list

Signed-off-by: Molly Sophia <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
* Add chat template for RWKV-World

Signed-off-by: Molly Sophia <[email protected]>

* RWKV: Fix the chat template not being used

Signed-off-by: Molly Sophia <[email protected]>

* RWKV v6: Set EOT token to ``\n\n``

Signed-off-by: Molly Sophia <[email protected]>

* readme: add rwkv into supported model list

Signed-off-by: Molly Sophia <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* Add chat template for RWKV-World

Signed-off-by: Molly Sophia <[email protected]>

* RWKV: Fix the chat template not being used

Signed-off-by: Molly Sophia <[email protected]>

* RWKV v6: Set EOT token to ``\n\n``

Signed-off-by: Molly Sophia <[email protected]>

* readme: add rwkv into supported model list

Signed-off-by: Molly Sophia <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* Add chat template for RWKV-World

Signed-off-by: Molly Sophia <[email protected]>

* RWKV: Fix the chat template not being used

Signed-off-by: Molly Sophia <[email protected]>

* RWKV v6: Set EOT token to ``\n\n``

Signed-off-by: Molly Sophia <[email protected]>

* readme: add rwkv into supported model list

Signed-off-by: Molly Sophia <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: When inferring with RWKV, an uncontrolled dialogue between 'user' and 'assistant' appears.
3 participants