Fixed Llama-3_1-Nemotron-51B doesn't work when 4K or more tokens #11008
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Make sure to read the contributing guidelines before submitting a PR
This is to fix this bug:
#11002
After inspecting the parameters between Llama-3.1-70B and the 51B ggufs while loading the ggufs with llama-cli, I noticed that there is exactly one difference at rope_theta (500000.0 vs 10000.0). Looking at config.json of 51B, this value should be 500000.0. That means the current convert_hf_to_gguf.py doesn't read rope_theta for DeciLMCausalModel. I fixed that and make this PR.
I generated an gguf with the correct rope_theta of 500000.0. It can work with llama.cpp b4380 or above without recompilation as I only fixed convert_hf_to_gguf.py without touching the C code.
https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/blob/main/Llama-3_1-Nemotron-51B-Instruct.imatrix.Q4_K_M.gguf
As a side, inspecting the tokenizer_config.json of Llama-3.1-70B, I find that it also have both eos_token and eot_token set to '<|eot_id|>'. Therefore, it probably is not a typo for 51B. So I also remove the four lines in set_vocab related to this.
This can get rid of this warning without causing any problems:
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect