Fixed Llama-3_1-Nemotron-51B doesn't work when 4K or more tokens #11008

ymcki · 2024-12-29T14:35:28Z

Make sure to read the contributing guidelines before submitting a PR

This is to fix this bug:
#11002

After inspecting the parameters between Llama-3.1-70B and the 51B ggufs while loading the ggufs with llama-cli, I noticed that there is exactly one difference at rope_theta (500000.0 vs 10000.0). Looking at config.json of 51B, this value should be 500000.0. That means the current convert_hf_to_gguf.py doesn't read rope_theta for DeciLMCausalModel. I fixed that and make this PR.

I generated an gguf with the correct rope_theta of 500000.0. It can work with llama.cpp b4380 or above without recompilation as I only fixed convert_hf_to_gguf.py without touching the C code.
https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/blob/main/Llama-3_1-Nemotron-51B-Instruct.imatrix.Q4_K_M.gguf

As a side, inspecting the tokenizer_config.json of Llama-3.1-70B, I find that it also have both eos_token and eot_token set to '<|eot_id|>'. Therefore, it probably is not a typo for 51B. So I also remove the four lines in set_vocab related to this.
This can get rid of this warning without causing any problems:
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect

ymcki and others added 8 commits December 19, 2024 10:41

conflict resolution

ecad966

Merge branch 'ggerganov:master' into master

12aded6

move comments after bracket to its own line

643e5e8

Merge branch 'ggerganov:master' into master

e68c76d

Merge branch 'ggerganov:master' into master

6a4805f

Merge branch 'ggerganov:master' into master

f9a1cdb

Merge branch 'ggerganov:master' into master

c1736f3

DeciLMCausalModel now reads rope_theta from config.json properly

984ffac

github-actions bot added the python python script changes label Dec 29, 2024

slaren approved these changes Dec 31, 2024

View reviewed changes

ggerganov approved these changes Dec 31, 2024

View reviewed changes

ggerganov merged commit bc7b1f8 into ggerganov:master Dec 31, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed Llama-3_1-Nemotron-51B doesn't work when 4K or more tokens #11008

Fixed Llama-3_1-Nemotron-51B doesn't work when 4K or more tokens #11008

ymcki commented Dec 29, 2024

Fixed Llama-3_1-Nemotron-51B doesn't work when 4K or more tokens #11008

Fixed Llama-3_1-Nemotron-51B doesn't work when 4K or more tokens #11008

Conversation

ymcki commented Dec 29, 2024