Add hybrid quantization for StableDiffusion pipelines #584

l-bat · 2024-03-04T16:23:11Z

What does this PR do?

Enabled quantization in hybrid mode for OVStableDiffusionPipeline, OVStableDiffusionXLPipeline and OVLatentConsistencyModelPipeline, when the part of the model is fully quantized and the weights of another part are just compressed
Added set of pre-defined datasets: ['conceptual_captions','laion/220k-GPT4Vision-captions-from-LIVIS','laion/filtered-wit']
Usage example

from optimum.intel.openvino import OVStableDiffusionPipeline
from optimum.intel import OVWeightQuantizationConfig

model_id = "stabilityai/stable-diffusion-2-1" 
quantization_config = OVWeightQuantizationConfig(bits=8, dataset="conceptual_captions", num_samples=200)
model = OVStableDiffusionPipeline.from_pretrained(model_id, export=True, quantization_config=quantization_config)

Before submitting

Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-03-04T20:35:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

echarlaix

Thanks a lot for your work @l-bat

optimum/intel/openvino/configuration.py

optimum/intel/openvino/modeling_diffusion.py

optimum/intel/openvino/configuration.py

optimum/intel/openvino/quantization.py

echarlaix · 2024-03-05T18:07:47Z

optimum/intel/openvino/quantization.py

+    compressed_model = _weight_only_quantization(model, quantization_config)
+
+    quantized_model = nncf.quantize(
+        compressed_model,
+        dataset,
+        model_type=nncf.ModelType.TRANSFORMER,
+        ignored_scope=nncf.IgnoredScope(**ptq_ignored_scope),
+        advanced_parameters=nncf.AdvancedQuantizationParameters(AdvancedSmoothQuantParameters(matmul=-1)),
+        subset_size=quantization_config.subset_size,
+    )


why do we need to do this ? could you explain a bit by adding comments + explaining what is meant by hybrid quantization ?

Added docstring to _hybrid_quantization.

On the one hand, post-training quantization of the UNet model requires leads to accuracy drop. On the other hand, the weight compression doesn't improve performance when applying to Stable Diffusion models, because the size of activations is comparable to weights. That is why the proposal is to apply quantization in hybrid mode which means that we quantize: (1) weights of MatMul and Embedding layers and (2) activations of other layers.

AlexKoff88 · 2024-03-06T06:44:35Z

optimum/intel/openvino/modeling_diffusion.py

+                raise NotImplementedError(f"Quantization in hybrid mode is not supported for {cls.__name__}")
+
+            num_inference_steps = 4 if isinstance(cls, OVLatentConsistencyModelPipeline) else 50
+            quantization_config.dataset = dataset


Here you modify the value in the quantization_config object that the user passed to the API. This is not good, IMO. I would recommend passing dataset explicitly to _hybrid_quantization().

created deepcopy of quantization_config to pass it to _hybrid_quantization.

I don't think it is safe enough to create a deep copy. Imagine someone passes a huge dataset inside the config. We will copy it as far as I understand.

helena-intel · 2024-03-07T09:09:49Z

This looks very interesting, thanks @l-bat ! Can you add this to the documentation? https://github.com/huggingface/optimum-intel/blob/main/docs/source/optimization_ov.mdx It would also be good to add a note with a link to this example https://github.com/huggingface/optimum-intel/tree/main/examples/openvino/stable-diffusion (for people who find that when looking for quantizing stable diffusion, to let them know about this option)

AlexKoff88 · 2024-03-11T07:10:14Z

docs/source/optimization_ov.mdx

@@ -69,6 +69,23 @@ from optimum.intel import OVModelForCausalLM
 model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
 ```

+##  Hybrid quantization
+
+Traditional optimization methods like post-training 8-bit quantization do not work for Stable Diffusion models because accuracy drops significantly. On the other hand, weight compression does not improve performance when applied to Stable Diffusion models, as the size of activations is comparable to weights.


Suggested change

Traditional optimization methods like post-training 8-bit quantization do not work for Stable Diffusion models because accuracy drops significantly. On the other hand, weight compression does not improve performance when applied to Stable Diffusion models, as the size of activations is comparable to weights.

Traditional optimization methods like post-training 8-bit quantization do not work well for Stable Diffusion models and can lead to poor generation results. On the other hand, weight compression does not improve performance significantly when applied to Stable Diffusion models, as the size of activations is comparable to weights.

AlexKoff88 · 2024-03-11T07:13:38Z

docs/source/optimization_ov.mdx

+
+Traditional optimization methods like post-training 8-bit quantization do not work for Stable Diffusion models because accuracy drops significantly. On the other hand, weight compression does not improve performance when applied to Stable Diffusion models, as the size of activations is comparable to weights.
+The UNet model takes up most of the overall execution time of the pipeline. Thus, optimizing just one model brings substantial benefits in terms of inference speed while keeping acceptable accuracy without fine-tuning. Quantizing the rest of the diffusion pipeline does not significantly improve inference performance but could potentially lead to substantial degradation of accuracy.
+Therefore, the proposal is to apply quantization in hybrid mode for the UNet model and weight-only quantization for other pipeline components. The hybrid mode involves the quantization of weights in MatMul and Embedding layers, and activations of other layers, facilitating accuracy preservation post-optimization while reducing the model size.


Suggested change

Therefore, the proposal is to apply quantization in hybrid mode for the UNet model and weight-only quantization for other pipeline components. The hybrid mode involves the quantization of weights in MatMul and Embedding layers, and activations of other layers, facilitating accuracy preservation post-optimization while reducing the model size.

Therefore, the proposal is to apply quantization in *hybrid mode* for the UNet model and weight-only quantization for the rest of the pipeline components. The hybrid mode involves the quantization of weights in MatMul and Embedding layers, and activations of other layers, facilitating accuracy preservation post-optimization while reducing the model size.

AlexKoff88 · 2024-03-11T07:17:00Z

docs/source/optimization_ov.mdx

+Traditional optimization methods like post-training 8-bit quantization do not work for Stable Diffusion models because accuracy drops significantly. On the other hand, weight compression does not improve performance when applied to Stable Diffusion models, as the size of activations is comparable to weights.
+The UNet model takes up most of the overall execution time of the pipeline. Thus, optimizing just one model brings substantial benefits in terms of inference speed while keeping acceptable accuracy without fine-tuning. Quantizing the rest of the diffusion pipeline does not significantly improve inference performance but could potentially lead to substantial degradation of accuracy.
+Therefore, the proposal is to apply quantization in hybrid mode for the UNet model and weight-only quantization for other pipeline components. The hybrid mode involves the quantization of weights in MatMul and Embedding layers, and activations of other layers, facilitating accuracy preservation post-optimization while reducing the model size.
+For optimizing the Stable Diffusion pipeline, utilize the `quantization_config` to define optimization parameters. To enable hybrid quantization, specify the quantization dataset in the `quantization_config`; otherwise, weight-only quantization in specified precisions will be applied to UNet.


Suggested change

For optimizing the Stable Diffusion pipeline, utilize the `quantization_config` to define optimization parameters. To enable hybrid quantization, specify the quantization dataset in the `quantization_config`; otherwise, weight-only quantization in specified precisions will be applied to UNet.

The `quantization_config` is utilized to define optimization parameters for optimizing the Stable Diffusion pipeline. To enable hybrid quantization, specify the quantization dataset in the `quantization_config`. Otherwise, weight-only quantization to a specified data type (8 tr 4 bits) is applied to UNet model.

AlexKoff88 · 2024-03-11T07:17:52Z

optimum/intel/openvino/configuration.py

-        ratio (`float`, *optional*, defaults to 1.0):
+        dataset (`str or List[str]`, *optional*):
+            The dataset used for data-aware compression or quantization with NNCF. You can provide your own dataset
+            in a list of string or just use the the one from the list ['wikitext2','c4','c4-new','ptb','ptb-new'] for LLLMs


Suggested change

in a list of string or just use the the one from the list ['wikitext2','c4','c4-new','ptb','ptb-new'] for LLLMs

in a list of strings or just use the one from the list ['wikitext2','c4','c4-new','ptb','ptb-new'] for LLLMs

AlexKoff88 · 2024-03-11T07:18:16Z

optimum/intel/openvino/configuration.py

+        dataset (`str or List[str]`, *optional*):
+            The dataset used for data-aware compression or quantization with NNCF. You can provide your own dataset
+            in a list of string or just use the the one from the list ['wikitext2','c4','c4-new','ptb','ptb-new'] for LLLMs
+            or ['conceptual_captions','laion/220k-GPT4Vision-captions-from-LIVIS','laion/filtered-wit'] for SD models.


Suggested change

or ['conceptual_captions','laion/220k-GPT4Vision-captions-from-LIVIS','laion/filtered-wit'] for SD models.

or ['conceptual_captions','laion/220k-GPT4Vision-captions-from-LIVIS','laion/filtered-wit'] for diffusion models.

AlexKoff88 · 2024-03-11T07:20:11Z

optimum/intel/openvino/configuration.py

                raise ValueError(
                    f"""You have entered a string value for dataset. You can only choose between
-                    ['wikitext2','c4','c4-new','ptb','ptb-new'], but we found {self.dataset}"""
+                    {llm_datasets} for LLLMs or {stable_diffusion_datasets} for SD models, but we found {self.dataset}"""


Suggested change

{llm_datasets} for LLLMs or {stable_diffusion_datasets} for SD models, but we found {self.dataset}"""

{llm_datasets} for LLLMs or {stable_diffusion_datasets} for diffusion models, but we found {self.dataset}"""

AlexKoff88 · 2024-03-11T07:26:04Z

optimum/intel/openvino/modeling_diffusion.py

+            # load the UNet model uncompressed to apply hybrid quantization further
+            unet = cls.load_model(unet_path)
+            # Apply weights compression to other `components` without dataset
+            quantization_config.dataset = None


This is the error-prone approach when you change the values of the input argument, IMO. Please think about how we can make it more safe.

l-bat · 2024-03-11T11:43:58Z

This looks very interesting, thanks @l-bat ! Can you add this to the documentation? https://github.com/huggingface/optimum-intel/blob/main/docs/source/optimization_ov.mdx It would also be good to add a note with a link to this example https://github.com/huggingface/optimum-intel/tree/main/examples/openvino/stable-diffusion (for people who find that when looking for quantizing stable diffusion, to let them know about this option)

QAT example is outdated and will be revised in the next PR. After that I will add a note with a link to this example.

AlexKoff88 · 2024-03-11T14:00:41Z

optimum/intel/openvino/modeling_diffusion.py

+            q_config_params = quantization_config.__dict__
+            wc_params = {param: value for param, value in q_config_params.items() if param != "dataset"}
+            wc_quantization_config = OVWeightQuantizationConfig.from_dict(wc_params)
+        else:
+            wc_quantization_config = quantization_config
+            unet = cls.load_model(unet_path, wc_quantization_config)


Suggested change

q_config_params = quantization_config.__dict__

wc_params = {param: value for param, value in q_config_params.items() if param != "dataset"}

wc_quantization_config = OVWeightQuantizationConfig.from_dict(wc_params)

else:

wc_quantization_config = quantization_config

unet = cls.load_model(unet_path, wc_quantization_config)

weight_quantization_params = {param: value for param, value in quantization_config.__dict__.items() if param != "dataset"}

weight_quantization_config = OVWeightQuantizationConfig.from_dict(weight_quantization_params)

else:

weight_quantization_config = quantization_config

unet = cls.load_model(unet_path, weight_quantization_config)

AlexKoff88 · 2024-03-12T06:12:16Z

@echarlaix, PR is ready and we will update the SD example from QAT to the new approach in the follow-up PR.

echarlaix

Looks great thanks @l-bat

* Add hybrid quantization for StableDiffusion pipelines * apply black * fix tests * fix ruff * fix lcm bug * apply review comments * rework dataset processing * Add doc * remove SDXL test * Apply comments * reformat

…tension-for-transformers. (#455) * Support weight-only quantization with quantized operators in intel-extension-for-transformers * Update code style * Update readme for weight-only quantization example * Update code * Adapt intel-extension-for-transformers 1.3 API change Signed-off-by: Cheng, Penghui <[email protected]> * Support weight-only quantization with quantized operators in intel-extension-for-transformers * Update code * rebase code on main branch Signed-off-by: Cheng, Penghui <[email protected]> * Update example Signed-off-by: Cheng, Penghui <[email protected]> * Update optimum/intel/neural_compressor/quantization.py Co-authored-by: Ella Charlaix <[email protected]> * [OV]: Fixed inference after 4 bit weight compression (#569) * [OV]: Fixed inferece after 4 bit weight compression * Fixed issue * Update optimum/intel/openvino/modeling_decoder.py Co-authored-by: Ella Charlaix <[email protected]> * Applied comments * Fixed issue when request is None --------- Co-authored-by: Ella Charlaix <[email protected]> * Updated docs with load_in_4bit (#558) * Updated docs with load_in_4bit * Update documentation * Update documentation * typo --------- Co-authored-by: Ella Charlaix <[email protected]> * Update Transformers dependency requirements (#571) * Fix compatibility for latest transformers release (#570) * fix compatibility for latest transformers release * update setup * update setup * fix test input size * fix prepare generation for llama models * Deprecate compression options (#565) * deprecate compression options * style * fix configuration * Update CLI argument * update documentation * deprecate torch nn modules for ov quantizer * fix ov config for fp32 models * fix format * update documentation * Add check for configuration * fix ratio default value for SD models * add quantization_config argument for OVModel * remove commented line * Update docs/source/inference.mdx Co-authored-by: Alexander Kozlov <[email protected]> * add default config for causal LM * fix warning message --------- Co-authored-by: Alexander Kozlov <[email protected]> * Add default quantization int4 config for Mixtral-8x7B (#576) * Update stable diffusion example requirements (#579) * Fix collecting duplicate tensors in quantization calibration dataset (#577) * Added deepcopying of inputs collected by InferRequestWrapper. Added a test covering the fixed issue. * Phrasing tweaks * Add soundfile to test requirements * Added librosa to test requirements * Added copying to other data cache appends * Remove the need for real test data * Process __call__ call properly * Addressed suggested changes * Save an openvino config summarizing all information related to quantization when saving model (#578) * fix doc * remove default compression value * set default compression config when not provided * save openvino config to include quantization configuration * fix style * add test * update setup * style * remove from quantization_config key from ov_config * add test * update setup * modify method name * Fix warning (#582) * Fix warning * fix message warning * Add reference to the temporary directory for windows fix (#581) * Fix documentation (#583) * Fix documentation * fix * Add llama test model to cover MQA (#585) * change llama test model to cover MQA * keep llama and llama2 in tests * fix code style * Include nncf in openvino extra (#586) * Fix title documentation (#588) * Update OpenVINO documentation links in README.md (#587) * Update OpenVINO documentation links in README.md The links are now aligned with OpenVINO 2024.0 documentation, and include permalinks instead of direct links, when possible. * Update inference.mdx * Update index.mdx * Update installation.mdx * Update README.md * Fix default int8 quantization for CLI (#592) * Change model output parameter to last_hidden_states for IPEXModel (#589) * change model output parameter to last_hidden_states * update ipex model testiong * update testing * add output name to ipex model * Add IPEX model patcher (#567) * llama model patcher * fix jit model * fix jit model * rm autocast in model * add llama model patcher * support assisted decoding and add reorder cache function * add comment for _prepare_past_key_values * rebase main * fix model_dtype * rm useless comments * fix llama * add comments for ipex_rope and ipex_scale_dot_product * fix comments * add enable_tpp comments * fix import * fix review aroun2 * add torch.no_grad to avoid auto_kernel_selection issue * use torch.no_grad in jit trace * fix ipex model testing * add tests for ipex model generation with multi inputs * fix code style * remove __get__(self) as _reorder_cache is static method for the class * fix reorder_cache * use model_type * check if reorder_cache is a static method * fix _reorder_cache * fix raise import error * test ipex patching * fix comments * update API name and testing * disable untill ipex version 2.5.0 * update testing name * Update optimum/intel/ipex/modeling_base.py Co-authored-by: Ella Charlaix <[email protected]> * Update tests/ipex/test_modeling.py Co-authored-by: Ella Charlaix <[email protected]> * fix tests --------- Co-authored-by: Ella Charlaix <[email protected]> * Updates weight quantization section in the docs (#593) * Remove accelerate and onnxruntime from required dependencies (#590) * Remove accelerate dependency * Add accelerate to import backend mapping * Add eval method to OVModels * add onnxruntime install for OV test * fix test expected int8 * Fix OpenVINO image classification examples (#598) * Fix weights compression for OPenVINO models (#596) * hot fix for weights compression * rewrite mcok tests * Fix default ov config (#600) * Add warning for transformers>=4.38 and OpenVINO 2024.0 (#599) * Add warning for transformers>=4.38 and OpenVINO 2024.0 * Use is_openvino_version to compare versions * Show version warning only for llama and gpt-bigcode * Fix style, show OpenVINO version * Include affected model types in warning message * Add hybrid quantization for StableDiffusion pipelines (#584) * Add hybrid quantization for StableDiffusion pipelines * apply black * fix tests * fix ruff * fix lcm bug * apply review comments * rework dataset processing * Add doc * remove SDXL test * Apply comments * reformat * Show device name in _print_compiled_model_properties (#541) * Show device name in _print_compiled_model_properties Enable CACHE_DIR also for devices like "GPU:0" * Update optimum/intel/openvino/modeling_seq2seq.py Co-authored-by: Ella Charlaix <[email protected]> * Change check for gpu device --------- Co-authored-by: Ella Charlaix <[email protected]> * Update code with comments Signed-off-by: Cheng, Penghui <[email protected]> * Fixed pylint error Signed-off-by: Cheng, Penghui <[email protected]> * Update optimum/intel/neural_compressor/configuration.py Co-authored-by: Ella Charlaix <[email protected]> * Fixed example and UT for weight-only quantization Signed-off-by: Cheng, Penghui <[email protected]> * Fixed pre-ci test error Signed-off-by: Cheng, Penghui <[email protected]> * Fixed pre-ci test error Signed-off-by: Cheng, Penghui <[email protected]> * Fixed UT and examples error Signed-off-by: Cheng, Penghui <[email protected]> * Fixed pre-CI error Signed-off-by: Cheng, Penghui <[email protected]> * Fixed UT error Signed-off-by: Cheng, Penghui <[email protected]> * Update tests/openvino/test_modeling_basic.py Co-authored-by: Ella Charlaix <[email protected]> * Update examples/neural_compressor/language-modeling/README.md Co-authored-by: Ella Charlaix <[email protected]> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <[email protected]> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <[email protected]> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <[email protected]> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <[email protected]> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <[email protected]> * Load weight-only quantized model with INCModelForCausalLM Signed-off-by: Cheng, Penghui <[email protected]> * Changed parameters name for GPTQ in example Signed-off-by: Cheng, Penghui <[email protected]> * Changed parameters order in INCQuantizer.quantize Signed-off-by: Cheng, Penghui <[email protected]> * Fixed UT error Signed-off-by: Cheng, Penghui <[email protected]> * Update examples/neural_compressor/text-generation/run_generation.py Co-authored-by: Ella Charlaix <[email protected]> * Update optimum/intel/neural_compressor/quantization.py Co-authored-by: Ella Charlaix <[email protected]> * Update optimum/intel/neural_compressor/quantization.py Co-authored-by: Ella Charlaix <[email protected]> * Update import message Signed-off-by: Cheng, Penghui <[email protected]> * Limit intel-extension-for-transformers version Signed-off-by: Cheng, Penghui <[email protected]> * Limit torch version for weight-only quantization Signed-off-by: Cheng, Penghui <[email protected]> * Fixed doc building error Signed-off-by: Cheng, Penghui <[email protected]> --------- Signed-off-by: Cheng, Penghui <[email protected]> Co-authored-by: Ella Charlaix <[email protected]> Co-authored-by: Alexander Kozlov <[email protected]> Co-authored-by: Ella Charlaix <[email protected]> Co-authored-by: Lyalyushkin Nikolay <[email protected]> Co-authored-by: Helena Kloosterman <[email protected]> Co-authored-by: Nikita Savelyev <[email protected]> Co-authored-by: jiqing-feng <[email protected]> Co-authored-by: Ekaterina Aidova <[email protected]> Co-authored-by: Karol Blaszczak <[email protected]> Co-authored-by: Liubov Talamanova <[email protected]>

l-bat force-pushed the lt/hybrid_quant branch from d69470a to ba4f195 Compare March 4, 2024 20:29

echarlaix reviewed Mar 5, 2024

View reviewed changes

AlexKoff88 reviewed Mar 6, 2024

View reviewed changes

l-bat force-pushed the lt/hybrid_quant branch from 34d346d to 3544c4b Compare March 8, 2024 20:45

l-bat added 9 commits March 8, 2024 22:28

Add hybrid quantization for StableDiffusion pipelines

2e364ef

apply black

8ffc124

fix tests

bfd7172

fix ruff

93dae89

fix lcm bug

74f8883

apply review comments

783a654

rework dataset processing

24de966

Add doc

3544c4b

remove SDXL test

067c6d5

AlexKoff88 reviewed Mar 11, 2024

View reviewed changes

Apply comments

2dc4087

l-bat requested review from AlexKoff88 and echarlaix March 11, 2024 13:49

AlexKoff88 reviewed Mar 11, 2024

View reviewed changes

AlexKoff88 approved these changes Mar 11, 2024

View reviewed changes

reformat

636a613

AlexKoff88 approved these changes Mar 12, 2024

View reviewed changes

echarlaix approved these changes Mar 12, 2024

View reviewed changes

echarlaix merged commit 6faf445 into huggingface:main Mar 12, 2024
10 checks passed

l-bat mentioned this pull request Mar 18, 2024

[OV] Add Stable Diffusion notebook #613

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hybrid quantization for StableDiffusion pipelines #584

Add hybrid quantization for StableDiffusion pipelines #584

l-bat commented Mar 4, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 4, 2024

echarlaix left a comment

echarlaix Mar 5, 2024

l-bat Mar 6, 2024

AlexKoff88 Mar 6, 2024

l-bat Mar 6, 2024

AlexKoff88 Mar 7, 2024

helena-intel commented Mar 7, 2024

AlexKoff88 Mar 11, 2024

AlexKoff88 Mar 11, 2024

AlexKoff88 Mar 11, 2024

AlexKoff88 Mar 11, 2024

AlexKoff88 Mar 11, 2024

AlexKoff88 Mar 11, 2024

AlexKoff88 Mar 11, 2024

l-bat commented Mar 11, 2024

AlexKoff88 Mar 11, 2024

AlexKoff88 commented Mar 12, 2024

echarlaix left a comment

	Traditional optimization methods like post-training 8-bit quantization do not work for Stable Diffusion models because accuracy drops significantly. On the other hand, weight compression does not improve performance when applied to Stable Diffusion models, as the size of activations is comparable to weights.
	Traditional optimization methods like post-training 8-bit quantization do not work well for Stable Diffusion models and can lead to poor generation results. On the other hand, weight compression does not improve performance significantly when applied to Stable Diffusion models, as the size of activations is comparable to weights.

	Therefore, the proposal is to apply quantization in hybrid mode for the UNet model and weight-only quantization for other pipeline components. The hybrid mode involves the quantization of weights in MatMul and Embedding layers, and activations of other layers, facilitating accuracy preservation post-optimization while reducing the model size.
	Therefore, the proposal is to apply quantization in hybrid mode for the UNet model and weight-only quantization for the rest of the pipeline components. The hybrid mode involves the quantization of weights in MatMul and Embedding layers, and activations of other layers, facilitating accuracy preservation post-optimization while reducing the model size.

	For optimizing the Stable Diffusion pipeline, utilize the `quantization_config` to define optimization parameters. To enable hybrid quantization, specify the quantization dataset in the `quantization_config`; otherwise, weight-only quantization in specified precisions will be applied to UNet.
	The `quantization_config` is utilized to define optimization parameters for optimizing the Stable Diffusion pipeline. To enable hybrid quantization, specify the quantization dataset in the `quantization_config`. Otherwise, weight-only quantization to a specified data type (8 tr 4 bits) is applied to UNet model.

	in a list of string or just use the the one from the list ['wikitext2','c4','c4-new','ptb','ptb-new'] for LLLMs
	in a list of strings or just use the one from the list ['wikitext2','c4','c4-new','ptb','ptb-new'] for LLLMs

	or ['conceptual_captions','laion/220k-GPT4Vision-captions-from-LIVIS','laion/filtered-wit'] for SD models.
	or ['conceptual_captions','laion/220k-GPT4Vision-captions-from-LIVIS','laion/filtered-wit'] for diffusion models.

	{llm_datasets} for LLLMs or {stable_diffusion_datasets} for SD models, but we found {self.dataset}"""
	{llm_datasets} for LLLMs or {stable_diffusion_datasets} for diffusion models, but we found {self.dataset}"""

Add hybrid quantization for StableDiffusion pipelines #584

Add hybrid quantization for StableDiffusion pipelines #584

Conversation

l-bat commented Mar 4, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Mar 4, 2024

echarlaix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helena-intel commented Mar 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l-bat commented Mar 11, 2024

Choose a reason for hiding this comment

AlexKoff88 commented Mar 12, 2024

echarlaix left a comment

Choose a reason for hiding this comment

l-bat commented Mar 4, 2024 •

edited

Loading