support for W4A8KV8/16 and other models #19

KKwanhee · 2024-09-30T14:59:04Z

Thank you for making the code publicly available to evaluate various quantization configurations.

I have a few questions:

First, using these codes, is it possible to evaluate W4A8KV8 or W4A8KV16 instead of W4A8KV4 in lmquant/projects/llm/scripts/qoq.sh?

Second, can I run accuracy evaluations for other models like Qwen2.5-32B or Gemma2? If possible, are there default values for alpha and beta of qoq that do not result in significant accuracy loss?

Thanks!

synxlin · 2024-11-08T02:53:24Z

Hi,

For your first question, you can evaluate W4A8KV8 or W4A8KV16 by directly settings the dtype parameter for opts field in the configuration (see here).

For your second question, we currently do not have smooth alpha and beta for Qwen2.5 and Gemma2. You may try with the default settings here and here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for W4A8KV8/16 and other models #19

support for W4A8KV8/16 and other models #19

KKwanhee commented Sep 30, 2024 •

edited

Loading

synxlin commented Nov 8, 2024

support for W4A8KV8/16 and other models #19

support for W4A8KV8/16 and other models #19

Comments

KKwanhee commented Sep 30, 2024 • edited Loading

synxlin commented Nov 8, 2024

KKwanhee commented Sep 30, 2024 •

edited

Loading