lmquant for QoQ quantization and fake-quantized model dumping #7

SimpleTheoryOfTypes · 2024-05-15T17:54:37Z

To generate qserve formatted model checkpoints, the README directs me to utilize lmquant for QoQ quantization and to dump the fake-quantized models. However, the instructions for this process are not included on the GitHub page for llmquant. I would sincerely appreciate any guidance on how to dump fake quantized models. Thanks!

synxlin · 2024-05-15T21:02:11Z

To dump fake quantized models, please refer to the QoQ Readme page here. You simply add --save-model in your command and it will generate two pt files in the experiment directory: one with fake quantized weights, and the other with scales and zero points.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lmquant for QoQ quantization and fake-quantized model dumping #7

lmquant for QoQ quantization and fake-quantized model dumping #7

SimpleTheoryOfTypes commented May 15, 2024 •

edited

Loading

synxlin commented May 15, 2024

lmquant for QoQ quantization and fake-quantized model dumping #7

lmquant for QoQ quantization and fake-quantized model dumping #7

Comments

SimpleTheoryOfTypes commented May 15, 2024 • edited Loading

synxlin commented May 15, 2024

SimpleTheoryOfTypes commented May 15, 2024 •

edited

Loading