Add support for op_block_list #1036

pdufour · 2024-11-18T11:00:21Z

Background
Added a new argument to the quantize script called "op_block_list." If op_block_list is provided, do not quantize those ops. Sometimes you have ops that are incompatible with quantization.

Test Plan

Regession Test

This test is just making sure there are no regressions in behaviour when you don't provide an op_block_list
git clone https://huggingface.co/onnx-models/sentence-t5-base-onnx
This model has a /model/model.0/auto_model/encoder/block.0/layer.0/SelfAttention/Range node so we are checking that it is still excluded because it is part of the default exclude types (https://github.com/microsoft/onnxconverter-common/blob/master/onnxconverter_common/float16.py#L108)
Test with main branch of transformers.js
git checkout . && rm -rf ./*_*.onnx || true && PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder . --output_folder . --mode fp16
Run stat: stat -f "%z" model_fp16.onnx

220762121

Now test with this PR branch
git checkout . && rm -rf ./*_*.onnx || true && PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder . --output_folder . --mode fp16
Run stat: stat -f "%z" model_fp16.onnx

220762121

File size is the same so test passed

Qwen2-VL Test

Here we actually check that the op_block_list works
Checkout this PR for transformers.js
Clone this repo https://huggingface.co/pdufour/Qwen2-VL-2B-Instruct-ONNX-Q4-F16
Delete already quantized models so it doesn't double quant them
rm -rf onnx/*_*_*.onnx
Quantize the A model without the op_block_list
PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder ./onnx --output_folder ./onnx-dest --mode q4f16
Run the infer script: python3 infer.py Qwen/Qwen2-VL-2B-Instruct ./onnx
- Expected result: No error.
- Actual Result: onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from ./onnx/QwenVL_A_q4f16.onnx failed:Type Error: Type parameter (T) of Optype (Sub) bound to different types (tensor(float) and tensor(float16) in node (/Sub).
Now we quantize with the op block list
rm -rf onnx/*_*_*.onnx
PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder ./onnx --output_folder ./onnx --mode q4f16 --op_block_list Conv DynamicQuantizeLinear DequantizeLinear Resize
Try infer script again
- python3 infer.py Qwen/Qwen2-VL-2B-Instruct ./onnx
- The image shows a vintage teal-colored...
- You see the correct result

xenova · 2024-11-18T15:02:00Z

Very useful! Thanks! Just for my testing, is the model you're testing available on the Hugging Face hub?

pdufour · 2024-11-19T12:06:39Z

@xenova I have the exported model available here https://huggingface.co/pdufour/Qwen2-VL-2B-Instruct-ONNX-Q4-F16 but I haven't uploaded the source files. It might be easier to try on a smaller example. I've updated the description of the PR if you want to try that one.

pdufour · 2024-11-19T12:08:36Z

One curious behaviour is that if you do provide a op_block_list it doesn't include the defaults anymore https://github.com/microsoft/onnxconverter-common/blob/master/onnxconverter_common/float16.py#L141. I am not sure if you want that or not, could also include the defaults if that's preferred. But then it's impossible to clear them.

xenova · 2024-11-20T13:51:29Z

One curious behaviour is that if you do provide a op_block_list it doesn't include the defaults anymore https://github.com/microsoft/onnxconverter-common/blob/master/onnxconverter_common/float16.py#L141. I am not sure if you want that or not, could also include the defaults if that's preferred. But then it's impossible to clear them.

Good point! We should then default to None instead of an empty array.

pdufour · 2024-11-22T13:04:24Z

@xenova Updated PR to use None and added some more comprehensive tests in the description.

HuggingFaceDocBuilderDev · 2024-11-25T21:37:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

scripts/quantize.py

xenova · 2024-11-25T21:43:56Z

scripts/quantize.py

+    blocked_ops = set(float16.DEFAULT_OP_BLOCK_LIST)
+    if op_block_list is not None:
+        blocked_ops.update(op_block_list)
+


One minor limitation of this updated approach is that you can't choose to quantize a node which is in the default block list. Most of those ops are chosen since there aren't fp16 variants of those ops, so I don't think this is an issue.

TLDR: Can only add to block list.

xenova

Thanks!

pdufour added 2 commits November 18, 2024 10:52

Add support for op_block_list

b31f40f

Remove arg

ff81cb0

Set default to none

2ce32be

xenova added 2 commits November 25, 2024 23:33

Minor code suggestions

3e6b01a

whoops - actually apply suggestions

815d6b3

xenova reviewed Nov 25, 2024

View reviewed changes

scripts/quantize.py Outdated Show resolved Hide resolved

xenova reviewed Nov 25, 2024

View reviewed changes

xenova approved these changes Nov 25, 2024

View reviewed changes

xenova merged commit 5272b12 into huggingface:main Nov 25, 2024
4 checks passed

pdufour deleted the add-block-list branch November 28, 2024 11:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for op_block_list #1036

Add support for op_block_list #1036

pdufour commented Nov 18, 2024 •

edited

Loading

xenova commented Nov 18, 2024

pdufour commented Nov 19, 2024

pdufour commented Nov 19, 2024

xenova commented Nov 20, 2024

pdufour commented Nov 22, 2024

HuggingFaceDocBuilderDev commented Nov 25, 2024

xenova Nov 25, 2024 •

edited

Loading

xenova left a comment

Add support for op_block_list #1036

Add support for op_block_list #1036

Conversation

pdufour commented Nov 18, 2024 • edited Loading

xenova commented Nov 18, 2024

pdufour commented Nov 19, 2024

pdufour commented Nov 19, 2024

xenova commented Nov 20, 2024

pdufour commented Nov 22, 2024

HuggingFaceDocBuilderDev commented Nov 25, 2024

xenova Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

xenova left a comment

Choose a reason for hiding this comment

pdufour commented Nov 18, 2024 •

edited

Loading

xenova Nov 25, 2024 •

edited

Loading