Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in template application #127

Open
flatsiedatsie opened this issue Oct 8, 2024 · 6 comments
Open

Differences in template application #127

flatsiedatsie opened this issue Oct 8, 2024 · 6 comments

Comments

@flatsiedatsie
Copy link
Contributor

After switching to the Jinja templating engine, I got the feeling that my default model (Danube 3 500m) wasn't giving the same answers.

So I did a test between the old and new version, and to my surprise there is a difference:

Transformers.js:

  <|prompt|>What's the difference between red and green apples?</s><|answer|>

Jinja:

  <|prompt|>What's the difference between red and green apples?<|answer|> 

I then tried to use their latest Q4_K_M .gguf supplied by h2o.ai (for jinja) and also set their repo's latest config files (for Transformers.js). The result was the same.

I then compared the template that's embedded in the .gguf with the one in the config files. They were the same:

{% for message in messages %}{% if message['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% if ((message['role'] == 'user') != (loop.index0 % 2 == 0)) or ((message['role'] == 'assistant') != (loop.index0 % 2 == 1)) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '<|prompt|>' + message['content'].strip() + eos_token }}{% elif message['role'] == 'assistant' %}{{ '<|answer|>' + message['content'].strip() + eos_token }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|answer|>' }}{% endif %}
{% for message in messages %}{% if message['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% if ((message['role'] == 'user') != (loop.index0 % 2 == 0)) or ((message['role'] == 'assistant') != (loop.index0 % 2 == 1)) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '<|prompt|>' + message['content'].strip() + eos_token }}{% elif message['role'] == 'assistant' %}{{ '<|answer|>' + message['content'].strip() + eos_token }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|answer|>' }}{% endif %}

I then dove into the jinja template code, and realised I had noticed something odd earlier - [object Map] in the generated template - and had added some code to filter that out:

<|prompt|>Is the government of China a repressive regime?[object Map]<|answer|>

Is the example code (from utils.js) I was using correct?

const defaultChatTemplate = "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}";

		const template = new Template(
			window.llama_cpp_app.getChatTemplate() ?? defaultChatTemplate,
		);
		
		let rendered = template.render({
	    	messages,
	    	bos_token: await window.llama_cpp_app.detokenize([window.llama_cpp_app.getBOS()]),
	    	eos_token: await window.llama_cpp_app.detokenize([window.llama_cpp_app.getEOS()]),
	    	add_generation_prompt: true,
		});

		console.log("jinja: rendered: ", rendered);
@flatsiedatsie
Copy link
Contributor Author

Here's a console log with the variables split out first:

Screenshot 2024-10-08 at 13 05 08

@flatsiedatsie
Copy link
Contributor Author

flatsiedatsie commented Oct 8, 2024

I've solved it by adding new TextDecoder:

		const pre_bos_token = window.llama_cpp_app.getBOS();
		const pre_eos_token = window.llama_cpp_app.getEOS();
		console.log("jinja: pre_bos_token: ", pre_bos_token);
		console.log("jinja: pre_eos_token: ", pre_eos_token);
		
		let bos_token = await window.llama_cpp_app.detokenize([window.llama_cpp_app.getBOS()])
		let eos_token = await window.llama_cpp_app.detokenize([window.llama_cpp_app.getEOS()])
		bos_token = new TextDecoder().decode(bos_token);
		eos_token = new TextDecoder().decode(eos_token);
		console.log("jinja: bos_token: ", bos_token);
		console.log("jinja: eos_token: ", eos_token);

		let rendered = template.render({
			messages,
			bos_token: bos_token,
			eos_token: eos_token,
			add_generation_prompt: true,
		});
		console.log("jinja: rendered: ", rendered);

(this issue can now be closed)

@felladrin
Copy link
Contributor

felladrin commented Oct 8, 2024

That was a great investigation!

@ngxson
Copy link
Owner

ngxson commented Oct 8, 2024

bos_token: await window.llama_cpp_app.detokenize([window.llama_cpp_app.getBOS()]),

This is one of the reason why switching to typescript will save you some headaches.

image

The detokenize function always returns Uint8Array, not a string. If you try to input Uint8Array into a function that needs string, typescript will tell you not to do that.

image

@flatsiedatsie
Copy link
Contributor Author

You do realize that code is from Wllama right? :-D

return template.render({

@ngxson
Copy link
Owner

ngxson commented Oct 10, 2024

Hmm, right, the object inside template.render is not typed.

In typescript, we can also force checking the type using satisfies, for example this will cause error I mentioned above:

bos_token: await window.llama_cpp_app.detokenize([window.llama_cpp_app.getBOS()]) satisfies string,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants