Using both Qwen Instruct + Qwen infill via LoRA #10822

ngxson · 2024-12-13T22:57:06Z

ngxson
Dec 13, 2024
Collaborator

The author of Qwen model confirm that infill capability is only possible with Qwen-coder (non-Instruct): https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/discussions/2#6731a45e0e39be0605a0df20

This will limit the capability of the model to /infill only, so it cannot be used with /chat/completions

However, we know that the instruct version is indeed fine-tuned from non-instruct, see the technical report: https://arxiv.org/pdf/2409.12186

To make the model usable with both chat and infill, one solution is to extract the difference between 2 models to a LoRA adapter. This can be done via something like mergekit-extract-lora, then we can set lora scale at runtime (i.e. set to 0.0 on infill and 1.0 on chat)

ggerganov · 2024-12-14T06:41:12Z

ggerganov
Dec 14, 2024
Maintainer

Good idea! Btw, shouldn't we implement a LoRA extractor in llama.cpp?

6 replies

ngxson Dec 14, 2024
Collaborator Author

(Another idea would be to use ggml_build_backward_expand to "optimize" lora_A * lora_B to match the delta between 2 models, but I'm not even sure if this works)

ggerganov Dec 14, 2024
Maintainer

The power method in pt. 3 of this paper https://watermark.silverchair.com/300268.pdf seems quite simple to implement with ggml graphs. What do you think?

ngxson Dec 14, 2024
Collaborator Author

I can't access the given link, could you give me the name of the paper? (or a screenshot)

ggerganov Dec 14, 2024
Maintainer

Should be accessible from here: https://academic.oup.com/comjnl/article/30/3/268/364791 (click on PDF button)

ngxson Dec 14, 2024
Collaborator Author

Hmm yeah that sounds feasible. I also found an implementation using numpy here: https://gist.github.com/Zhenye-Na/cbf4e534b44ef94fdbad663ef56dd333

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using both Qwen Instruct + Qwen infill via LoRA #10822

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Using both Qwen Instruct + Qwen infill via LoRA #10822

ngxson Dec 13, 2024 Collaborator

Replies: 1 comment · 6 replies

ggerganov Dec 14, 2024 Maintainer

ngxson Dec 14, 2024 Collaborator Author

ggerganov Dec 14, 2024 Maintainer

ngxson Dec 14, 2024 Collaborator Author

ggerganov Dec 14, 2024 Maintainer

ngxson Dec 14, 2024 Collaborator Author

ngxson
Dec 13, 2024
Collaborator

Replies: 1 comment 6 replies

ggerganov
Dec 14, 2024
Maintainer

ngxson Dec 14, 2024
Collaborator Author

ggerganov Dec 14, 2024
Maintainer

ngxson Dec 14, 2024
Collaborator Author

ggerganov Dec 14, 2024
Maintainer

ngxson Dec 14, 2024
Collaborator Author