-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Mixed-precision configuration not working with STATIC quantization #163
Comments
There's no PS: This function hasn't update for a long time. If you confirm there's a bug, please feel free to contact me anytime. |
Hi @Harahan, The bug, unfortunately, persists. The "mechanism" is the same: function Note that the correct per-layer quantization configurations are loaded when executing To sum it up: I think that the core issue that causes the [suspected] bug is that the calibration stage & function Can you please look into it? |
P.S. |
It depends on whether we encounter such a need or it will be used in our research. So, not sure. |
Did you succeed in reproducing the |
I'm sorry, but we do not have enough time to do this. If you are sure there's a bug, post the log/evidence and reopen the issue. |
LLMC_RTN_W8A8_MixedA16_Bug.txt I'm pretty sure this is a bug. Please find attached 2 logs of LLMC with an RTN configuration.
|
I reopen the issue. Since we currently don't have the requirement for the static quantization, the bug may be fixed a long time later. You'd best try other settings. |
Hi, I wanted to start using this library for a couple of things, but just to confirm, this bug affects situations where: Can it be confirmed that it does not apply when I would like to more or less have the same bit-width for all components of the model or different for activations/weights? |
To the best of my understanding, |
Dear LLMC team,
I've been trying to run mixed-precision PTQ quantization using RTN.
I suspect there's a bug, as the non-default settings in
mix_bits
are ignored.My understanding of the code:
get_act_qparams()
ofrtn.py
, the values ofqmax
/qmin
/scales
/zeros
are determined using the default quantizer bit precisionbuf_act_<xxx>
buffers, for all modules / layers.a_qdq()
ofrtn.py
, though theaquantizer
object of each layer is configured correctly, it blindly loads from buffer the registered quantization parametersqmin
/qmax
/scales
/zeros
, and uses them, instead of the actual values it should support.What do you think?
Thanks in advance!
The text was updated successfully, but these errors were encountered: