Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for op_block_list #1036

Merged
merged 5 commits into from
Nov 25, 2024
Merged

Conversation

pdufour
Copy link
Contributor

@pdufour pdufour commented Nov 18, 2024

Background
Added a new argument to the quantize script called "op_block_list." If op_block_list is provided, do not quantize those ops. Sometimes you have ops that are incompatible with quantization.

Test Plan

Regession Test

  • This test is just making sure there are no regressions in behaviour when you don't provide an op_block_list
  • git clone https://huggingface.co/onnx-models/sentence-t5-base-onnx
  • This model has a /model/model.0/auto_model/encoder/block.0/layer.0/SelfAttention/Range node so we are checking that it is still excluded because it is part of the default exclude types (https://github.com/microsoft/onnxconverter-common/blob/master/onnxconverter_common/float16.py#L108)
  • Test with main branch of transformers.js
  • git checkout . && rm -rf ./*_*.onnx || true && PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder . --output_folder . --mode fp16
  • Run stat: stat -f "%z" model_fp16.onnx
220762121
  • Now test with this PR branch
  • git checkout . && rm -rf ./*_*.onnx || true && PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder . --output_folder . --mode fp16
  • Run stat: stat -f "%z" model_fp16.onnx
220762121
  • File size is the same so test passed

Qwen2-VL Test

  • Here we actually check that the op_block_list works
  • Checkout this PR for transformers.js
  • Clone this repo https://huggingface.co/pdufour/Qwen2-VL-2B-Instruct-ONNX-Q4-F16
  • Delete already quantized models so it doesn't double quant them
  • rm -rf onnx/*_*_*.onnx
  • Quantize the A model without the op_block_list
  • PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder ./onnx --output_folder ./onnx-dest --mode q4f16
  • Run the infer script: python3 infer.py Qwen/Qwen2-VL-2B-Instruct ./onnx
    • Expected result: No error.
    • Actual Result: onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from ./onnx/QwenVL_A_q4f16.onnx failed:Type Error: Type parameter (T) of Optype (Sub) bound to different types (tensor(float) and tensor(float16) in node (/Sub).
  • Now we quantize with the op block list
  • rm -rf onnx/*_*_*.onnx
  • PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder ./onnx --output_folder ./onnx --mode q4f16 --op_block_list Conv DynamicQuantizeLinear DequantizeLinear Resize
  • Try infer script again
    • python3 infer.py Qwen/Qwen2-VL-2B-Instruct ./onnx
    • The image shows a vintage teal-colored...
    • You see the correct result

@xenova
Copy link
Collaborator

xenova commented Nov 18, 2024

Very useful! Thanks! Just for my testing, is the model you're testing available on the Hugging Face hub?

@pdufour
Copy link
Contributor Author

pdufour commented Nov 19, 2024

@xenova I have the exported model available here https://huggingface.co/pdufour/Qwen2-VL-2B-Instruct-ONNX-Q4-F16 but I haven't uploaded the source files. It might be easier to try on a smaller example. I've updated the description of the PR if you want to try that one.

@pdufour
Copy link
Contributor Author

pdufour commented Nov 19, 2024

One curious behaviour is that if you do provide a op_block_list it doesn't include the defaults anymore https://github.com/microsoft/onnxconverter-common/blob/master/onnxconverter_common/float16.py#L141. I am not sure if you want that or not, could also include the defaults if that's preferred. But then it's impossible to clear them.

@xenova
Copy link
Collaborator

xenova commented Nov 20, 2024

One curious behaviour is that if you do provide a op_block_list it doesn't include the defaults anymore https://github.com/microsoft/onnxconverter-common/blob/master/onnxconverter_common/float16.py#L141. I am not sure if you want that or not, could also include the defaults if that's preferred. But then it's impossible to clear them.

Good point! We should then default to None instead of an empty array.

@pdufour
Copy link
Contributor Author

pdufour commented Nov 22, 2024

@xenova Updated PR to use None and added some more comprehensive tests in the description.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

scripts/quantize.py Outdated Show resolved Hide resolved
Comment on lines +193 to +196
blocked_ops = set(float16.DEFAULT_OP_BLOCK_LIST)
if op_block_list is not None:
blocked_ops.update(op_block_list)

Copy link
Collaborator

@xenova xenova Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor limitation of this updated approach is that you can't choose to quantize a node which is in the default block list. Most of those ops are chosen since there aren't fp16 variants of those ops, so I don't think this is an issue.

TLDR: Can only add to block list.

Copy link
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@xenova xenova merged commit 5272b12 into huggingface:main Nov 25, 2024
4 checks passed
@pdufour pdufour deleted the add-block-list branch November 28, 2024 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants