Where can I find a compatible draft model for Llama-3_1-Nemotron-51B-Instruct? #10998

mashdragon · 2024-12-27T19:59:16Z

mashdragon
Dec 27, 2024

I'm struggling to find a compatible model for this quantized model family.

Original model: https://huggingface.co/nvidia/Llama-3_1-Nemotron-51B-Instruct

Draft models I've tried:

Mismatch error from llama.cpp:

common_speculative_are_compatible: draft model special tokens must match target model to use speculation
common_speculative_are_compatible: tgt: bos = 128000 (1), eos = 128001 (0)
common_speculative_are_compatible: dft: bos = 128000 (1), eos = 128009 (0)
srv    load_model: the draft model 'Llama-3.2-3B-Instruct-uncensored-Q5_K_M.gguf' is not compatible with the target model 'Llama-3_1-Nemotron-51B-Instruct-Q5_K_M.gguf'

A pointer to a compatible smaller model, or some instructions on how to distill the Nemotron one to create my own compatible model, would be much appreciated.

Answered by mashdragon

Dec 30, 2024

This was a GGUF config issue that appears to be fixed with #11008

View full answer

mashdragon · 2024-12-30T00:17:36Z

mashdragon
Dec 30, 2024
Author

This was a GGUF config issue that appears to be fixed with #11008

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where can I find a compatible draft model for Llama-3_1-Nemotron-51B-Instruct? #10998

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Where can I find a compatible draft model for Llama-3_1-Nemotron-51B-Instruct? #10998

mashdragon Dec 27, 2024

Replies: 1 comment

mashdragon Dec 30, 2024 Author

mashdragon
Dec 27, 2024

mashdragon
Dec 30, 2024
Author