-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add modules_in_block_to_quantize
arg in GPTQconfig
#27956
add modules_in_block_to_quantize
arg in GPTQconfig
#27956
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but.... can you change the name? 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding and linking to the corresponding PR in optimum!
Overall looks OK - agree with @ArthurZucker that docstring needs some clarity
modules_in_block_to_quantize
arg in GPTQconfig
Co-authored-by: Arthur <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating! The docstring still needs at least one more iteration to make sure it's clear
modules_in_block_to_quantize (`List[List[str]]`, *optional*): | ||
List list of module names to quantize in the block specified. This argument is useful to exclude certain linear modules from being quantized. | ||
The block to quantize can be specified by setting `block_name_to_quantize`. We will quantize each list sequentially. If not set, we will quantize all linear layers. | ||
Example: `inside_layer_modules=[["self_attention.query_key_value"], ["mlp.dense_h_to_4h"]]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does each list in the list represent here? E.g. is the first element in the list - modules_in_block_to_quantize[0]
- the list of layers to quantize for the first block? How does that match with block_name_to_quantize
which seems to only take a single string i.e. I would expect a single block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will detail a little bit more why we are using a list of list. See comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - much clearer!
Co-authored-by: amyeroberts <[email protected]>
* add inside_layer_modules arg * fix * change to modules_to_quantize_inside_block * fix * remane again * Apply suggestions from code review Co-authored-by: Arthur <[email protected]> * better docsting * fix again with less explanation * Update src/transformers/utils/quantization_config.py Co-authored-by: amyeroberts <[email protected]> * style --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: amyeroberts <[email protected]>
* add inside_layer_modules arg * fix * change to modules_to_quantize_inside_block * fix * remane again * Apply suggestions from code review Co-authored-by: Arthur <[email protected]> * better docsting * fix again with less explanation * Update src/transformers/utils/quantization_config.py Co-authored-by: amyeroberts <[email protected]> * style --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: amyeroberts <[email protected]>
What does this PR do?
This PR adds the
modules_in_block_to_quantize
quantization arg for gptq. This is necessary for converting specific layers to quantized layers. With this PR, we should be able to run the gptq mixtral model. See related PR in optimum.