Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lmquant for QoQ quantization and fake-quantized model dumping #7

Open
SimpleTheoryOfTypes opened this issue May 15, 2024 · 1 comment

Comments

@SimpleTheoryOfTypes
Copy link

SimpleTheoryOfTypes commented May 15, 2024

To generate qserve formatted model checkpoints, the README directs me to utilize lmquant for QoQ quantization and to dump the fake-quantized models. However, the instructions for this process are not included on the GitHub page for llmquant. I would sincerely appreciate any guidance on how to dump fake quantized models. Thanks!

@synxlin
Copy link
Contributor

synxlin commented May 15, 2024

To dump fake quantized models, please refer to the QoQ Readme page here. You simply add --save-model in your command and it will generate two pt files in the experiment directory: one with fake quantized weights, and the other with scales and zero points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants