add `modules_in_block_to_quantize` arg in GPTQconfig #27956

SunMarc · 2023-12-11T20:58:33Z

What does this PR do?

This PR adds the modules_in_block_to_quantize quantization arg for gptq. This is necessary for converting specific layers to quantized layers. With this PR, we should be able to run the gptq mixtral model. See related PR in optimum.

from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig

model_name = "TheBloke/Mixtral-8x7B-v0.1-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map={"":0})
print(model)

inputs = tokenizer.encode("Hello, how are you today ?", return_tensors="pt").to(0)
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

HuggingFaceDocBuilderDev · 2023-12-11T23:00:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

LGTM but.... can you change the name? 🤗

src/transformers/utils/quantization_config.py

amyeroberts

Thanks for adding and linking to the corresponding PR in optimum!

Overall looks OK - agree with @ArthurZucker that docstring needs some clarity

src/transformers/utils/quantization_config.py

Co-authored-by: Arthur <[email protected]>

amyeroberts

Thanks for iterating! The docstring still needs at least one more iteration to make sure it's clear

amyeroberts · 2023-12-13T15:35:38Z

src/transformers/utils/quantization_config.py

+        modules_in_block_to_quantize (`List[List[str]]`, *optional*):
+            List list of module names to quantize in the block specified. This argument is useful to exclude certain linear modules from being quantized.
+            The block to quantize can be specified by setting `block_name_to_quantize`. We will quantize each list sequentially. If not set, we will quantize all linear layers.
+            Example: `inside_layer_modules=[["self_attention.query_key_value"], ["mlp.dense_h_to_4h"]]`


What does each list in the list represent here? E.g. is the first element in the list - modules_in_block_to_quantize[0] - the list of layers to quantize for the first block? How does that match with block_name_to_quantize which seems to only take a single string i.e. I would expect a single block.

Yes, I will detail a little bit more why we are using a list of list. See comment above.

amyeroberts

Thanks - much clearer!

src/transformers/utils/quantization_config.py

Co-authored-by: amyeroberts <[email protected]>

* add inside_layer_modules arg * fix * change to modules_to_quantize_inside_block * fix * remane again * Apply suggestions from code review Co-authored-by: Arthur <[email protected]> * better docsting * fix again with less explanation * Update src/transformers/utils/quantization_config.py Co-authored-by: amyeroberts <[email protected]> * style --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: amyeroberts <[email protected]>

add inside_layer_modules arg

d723dca

SunMarc requested a review from amyeroberts December 11, 2023 20:58

fix

6d9afb5

LaaZa mentioned this pull request Dec 12, 2023

Add support for Mixtral models. AutoGPTQ/AutoGPTQ#480

Merged

SunMarc requested a review from ArthurZucker December 12, 2023 13:50

ArthurZucker reviewed Dec 12, 2023

View reviewed changes

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

amyeroberts reviewed Dec 12, 2023

View reviewed changes

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

SunMarc added 2 commits December 12, 2023 19:57

change to modules_to_quantize_inside_block

340b66c

fix

cb2778f

SunMarc requested review from amyeroberts and ArthurZucker December 12, 2023 19:02

remane again

6a517d8

SunMarc changed the title ~~add inside_layer_modules arg~~ add modules_in_block_to_quantize arg in GPTQconfig Dec 12, 2023

ArthurZucker reviewed Dec 13, 2023

View reviewed changes

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

Apply suggestions from code review

69d7f1d

Co-authored-by: Arthur <[email protected]>

amyeroberts reviewed Dec 13, 2023

View reviewed changes

better docsting

23dfeb3

SunMarc requested review from amyeroberts and ArthurZucker December 13, 2023 16:11

fix again with less explanation

f2690f1

amyeroberts approved these changes Dec 13, 2023

View reviewed changes

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

SunMarc and others added 2 commits December 13, 2023 12:19

Update src/transformers/utils/quantization_config.py

94229b9

Co-authored-by: amyeroberts <[email protected]>

style

f9628d4

ArthurZucker approved these changes Dec 13, 2023

View reviewed changes

SunMarc merged commit 17506d1 into huggingface:main Dec 13, 2023
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `modules_in_block_to_quantize` arg in GPTQconfig #27956

add `modules_in_block_to_quantize` arg in GPTQconfig #27956

SunMarc commented Dec 11, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 11, 2023

ArthurZucker left a comment

amyeroberts left a comment

amyeroberts left a comment

amyeroberts Dec 13, 2023

SunMarc Dec 13, 2023

amyeroberts left a comment

add modules_in_block_to_quantize arg in GPTQconfig #27956

add modules_in_block_to_quantize arg in GPTQconfig #27956

Conversation

SunMarc commented Dec 11, 2023 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Dec 11, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Dec 13, 2023

Choose a reason for hiding this comment

SunMarc Dec 13, 2023

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

add `modules_in_block_to_quantize` arg in GPTQconfig #27956

add `modules_in_block_to_quantize` arg in GPTQconfig #27956

SunMarc commented Dec 11, 2023 •

edited

Loading