You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To generate qserve formatted model checkpoints, the README directs me to utilize lmquant for QoQ quantization and to dump the fake-quantized models. However, the instructions for this process are not included on the GitHub page for llmquant. I would sincerely appreciate any guidance on how to dump fake quantized models. Thanks!
The text was updated successfully, but these errors were encountered:
To dump fake quantized models, please refer to the QoQ Readme page here. You simply add --save-model in your command and it will generate two pt files in the experiment directory: one with fake quantized weights, and the other with scales and zero points.
To generate qserve formatted model checkpoints, the README directs me to utilize lmquant for QoQ quantization and to dump the fake-quantized models. However, the instructions for this process are not included on the GitHub page for llmquant. I would sincerely appreciate any guidance on how to dump fake quantized models. Thanks!
The text was updated successfully, but these errors were encountered: