Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for W4A8KV8/16 and other models #19

Open
KKwanhee opened this issue Sep 30, 2024 · 1 comment
Open

support for W4A8KV8/16 and other models #19

KKwanhee opened this issue Sep 30, 2024 · 1 comment

Comments

@KKwanhee
Copy link

KKwanhee commented Sep 30, 2024

Thank you for making the code publicly available to evaluate various quantization configurations.

I have a few questions:

First, using these codes, is it possible to evaluate W4A8KV8 or W4A8KV16 instead of W4A8KV4 in lmquant/projects/llm/scripts/qoq.sh?

Second, can I run accuracy evaluations for other models like Qwen2.5-32B or Gemma2? If possible, are there default values for alpha and beta of qoq that do not result in significant accuracy loss?

Thanks!

@synxlin
Copy link
Contributor

synxlin commented Nov 8, 2024

Hi,

For your first question, you can evaluate W4A8KV8 or W4A8KV16 by directly settings the dtype parameter for opts field in the configuration (see here).

For your second question, we currently do not have smooth alpha and beta for Qwen2.5 and Gemma2. You may try with the default settings here and here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants