Differences in template application #127

flatsiedatsie · 2024-10-08T10:57:06Z

After switching to the Jinja templating engine, I got the feeling that my default model (Danube 3 500m) wasn't giving the same answers.

So I did a test between the old and new version, and to my surprise there is a difference:

Transformers.js:

  <|prompt|>What's the difference between red and green apples?</s><|answer|>

Jinja:

  <|prompt|>What's the difference between red and green apples?<|answer|>

I then tried to use their latest Q4_K_M .gguf supplied by h2o.ai (for jinja) and also set their repo's latest config files (for Transformers.js). The result was the same.

I then compared the template that's embedded in the .gguf with the one in the config files. They were the same:

{% for message in messages %}{% if message['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% if ((message['role'] == 'user') != (loop.index0 % 2 == 0)) or ((message['role'] == 'assistant') != (loop.index0 % 2 == 1)) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '<|prompt|>' + message['content'].strip() + eos_token }}{% elif message['role'] == 'assistant' %}{{ '<|answer|>' + message['content'].strip() + eos_token }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|answer|>' }}{% endif %}
{% for message in messages %}{% if message['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% if ((message['role'] == 'user') != (loop.index0 % 2 == 0)) or ((message['role'] == 'assistant') != (loop.index0 % 2 == 1)) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '<|prompt|>' + message['content'].strip() + eos_token }}{% elif message['role'] == 'assistant' %}{{ '<|answer|>' + message['content'].strip() + eos_token }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|answer|>' }}{% endif %}

I then dove into the jinja template code, and realised I had noticed something odd earlier - [object Map] in the generated template - and had added some code to filter that out:

<|prompt|>Is the government of China a repressive regime?[object Map]<|answer|>

Is the example code (from utils.js) I was using correct?

const defaultChatTemplate = "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}";

		const template = new Template(
			window.llama_cpp_app.getChatTemplate() ?? defaultChatTemplate,
		);
		
		let rendered = template.render({
	    	messages,
	    	bos_token: await window.llama_cpp_app.detokenize([window.llama_cpp_app.getBOS()]),
	    	eos_token: await window.llama_cpp_app.detokenize([window.llama_cpp_app.getEOS()]),
	    	add_generation_prompt: true,
		});

		console.log("jinja: rendered: ", rendered);

The text was updated successfully, but these errors were encountered:

flatsiedatsie · 2024-10-08T11:06:34Z

Here's a console log with the variables split out first:

flatsiedatsie · 2024-10-08T11:12:21Z

I've solved it by adding new TextDecoder:

		const pre_bos_token = window.llama_cpp_app.getBOS();
		const pre_eos_token = window.llama_cpp_app.getEOS();
		console.log("jinja: pre_bos_token: ", pre_bos_token);
		console.log("jinja: pre_eos_token: ", pre_eos_token);
		
		let bos_token = await window.llama_cpp_app.detokenize([window.llama_cpp_app.getBOS()])
		let eos_token = await window.llama_cpp_app.detokenize([window.llama_cpp_app.getEOS()])
		bos_token = new TextDecoder().decode(bos_token);
		eos_token = new TextDecoder().decode(eos_token);
		console.log("jinja: bos_token: ", bos_token);
		console.log("jinja: eos_token: ", eos_token);

		let rendered = template.render({
			messages,
			bos_token: bos_token,
			eos_token: eos_token,
			add_generation_prompt: true,
		});
		console.log("jinja: rendered: ", rendered);

(this issue can now be closed)

felladrin · 2024-10-08T11:27:31Z

That was a great investigation!

ngxson · 2024-10-08T11:44:14Z

bos_token: await window.llama_cpp_app.detokenize([window.llama_cpp_app.getBOS()]),

This is one of the reason why switching to typescript will save you some headaches.

The detokenize function always returns Uint8Array, not a string. If you try to input Uint8Array into a function that needs string, typescript will tell you not to do that.

flatsiedatsie · 2024-10-08T12:55:23Z

You do realize that code is from Wllama right? :-D

wllama/examples/main/src/utils/utils.ts

Line 42 in ffcd98a

return template.render({

ngxson · 2024-10-10T09:40:32Z

Hmm, right, the object inside template.render is not typed.

In typescript, we can also force checking the type using satisfies, for example this will cause error I mentioned above:

bos_token: await window.llama_cpp_app.detokenize([window.llama_cpp_app.getBOS()]) satisfies string,

ngxson mentioned this issue Oct 24, 2024

sync to latest upstream source code #129

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differences in template application #127

Differences in template application #127

flatsiedatsie commented Oct 8, 2024

flatsiedatsie commented Oct 8, 2024

flatsiedatsie commented Oct 8, 2024 •

edited

Loading

felladrin commented Oct 8, 2024 •

edited

Loading

ngxson commented Oct 8, 2024

flatsiedatsie commented Oct 8, 2024

ngxson commented Oct 10, 2024

Differences in template application #127

Differences in template application #127

Comments

flatsiedatsie commented Oct 8, 2024

flatsiedatsie commented Oct 8, 2024

flatsiedatsie commented Oct 8, 2024 • edited Loading

felladrin commented Oct 8, 2024 • edited Loading

ngxson commented Oct 8, 2024

flatsiedatsie commented Oct 8, 2024

ngxson commented Oct 10, 2024

flatsiedatsie commented Oct 8, 2024 •

edited

Loading

felladrin commented Oct 8, 2024 •

edited

Loading