Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add modules_in_block_to_quantize arg in GPTQconfig #27956

Merged

Conversation

SunMarc
Copy link
Member

@SunMarc SunMarc commented Dec 11, 2023

What does this PR do?

This PR adds the modules_in_block_to_quantize quantization arg for gptq. This is necessary for converting specific layers to quantized layers. With this PR, we should be able to run the gptq mixtral model. See related PR in optimum.

from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig

model_name = "TheBloke/Mixtral-8x7B-v0.1-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map={"":0})
print(model)

inputs = tokenizer.encode("Hello, how are you today ?", return_tensors="pt").to(0)
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

@SunMarc SunMarc requested a review from amyeroberts December 11, 2023 20:58
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but.... can you change the name? 🤗

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding and linking to the corresponding PR in optimum!

Overall looks OK - agree with @ArthurZucker that docstring needs some clarity

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved
@SunMarc SunMarc changed the title add inside_layer_modules arg add modules_in_block_to_quantize arg in GPTQconfig Dec 12, 2023
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! The docstring still needs at least one more iteration to make sure it's clear

Comment on lines 383 to 386
modules_in_block_to_quantize (`List[List[str]]`, *optional*):
List list of module names to quantize in the block specified. This argument is useful to exclude certain linear modules from being quantized.
The block to quantize can be specified by setting `block_name_to_quantize`. We will quantize each list sequentially. If not set, we will quantize all linear layers.
Example: `inside_layer_modules=[["self_attention.query_key_value"], ["mlp.dense_h_to_4h"]]`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does each list in the list represent here? E.g. is the first element in the list - modules_in_block_to_quantize[0] - the list of layers to quantize for the first block? How does that match with block_name_to_quantize which seems to only take a single string i.e. I would expect a single block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will detail a little bit more why we are using a list of list. See comment above.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - much clearer!

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved
@SunMarc SunMarc merged commit 17506d1 into huggingface:main Dec 13, 2023
21 checks passed
iantbutler01 pushed a commit to BismuthCloud/transformers that referenced this pull request Dec 16, 2023
* add inside_layer_modules arg

* fix

* change to modules_to_quantize_inside_block

* fix

* remane again

* Apply suggestions from code review

Co-authored-by: Arthur <[email protected]>

* better docsting

* fix again with less explanation

* Update src/transformers/utils/quantization_config.py

Co-authored-by: amyeroberts <[email protected]>

* style

---------

Co-authored-by: Arthur <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
staghado pushed a commit to staghado/transformers that referenced this pull request Jan 15, 2024
* add inside_layer_modules arg

* fix

* change to modules_to_quantize_inside_block

* fix

* remane again

* Apply suggestions from code review

Co-authored-by: Arthur <[email protected]>

* better docsting

* fix again with less explanation

* Update src/transformers/utils/quantization_config.py

Co-authored-by: amyeroberts <[email protected]>

* style

---------

Co-authored-by: Arthur <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants