-
Notifications
You must be signed in to change notification settings - Fork 793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for op_block_list #1036
Conversation
Very useful! Thanks! Just for my testing, is the model you're testing available on the Hugging Face hub? |
@xenova I have the exported model available here https://huggingface.co/pdufour/Qwen2-VL-2B-Instruct-ONNX-Q4-F16 but I haven't uploaded the source files. It might be easier to try on a smaller example. I've updated the description of the PR if you want to try that one. |
One curious behaviour is that if you do provide a op_block_list it doesn't include the defaults anymore https://github.com/microsoft/onnxconverter-common/blob/master/onnxconverter_common/float16.py#L141. I am not sure if you want that or not, could also include the defaults if that's preferred. But then it's impossible to clear them. |
Good point! We should then default to |
@xenova Updated PR to use None and added some more comprehensive tests in the description. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
blocked_ops = set(float16.DEFAULT_OP_BLOCK_LIST) | ||
if op_block_list is not None: | ||
blocked_ops.update(op_block_list) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor limitation of this updated approach is that you can't choose to quantize a node which is in the default block list. Most of those ops are chosen since there aren't fp16 variants of those ops, so I don't think this is an issue.
TLDR: Can only add to block list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Background
Added a new argument to the quantize script called "op_block_list." If op_block_list is provided, do not quantize those ops. Sometimes you have ops that are incompatible with quantization.
Test Plan
Regession Test
op_block_list
git clone https://huggingface.co/onnx-models/sentence-t5-base-onnx
/model/model.0/auto_model/encoder/block.0/layer.0/SelfAttention/Range
node so we are checking that it is still excluded because it is part of the default exclude types (https://github.com/microsoft/onnxconverter-common/blob/master/onnxconverter_common/float16.py#L108)main
branch of transformers.jsgit checkout . && rm -rf ./*_*.onnx || true && PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder . --output_folder . --mode fp16
stat -f "%z" model_fp16.onnx
git checkout . && rm -rf ./*_*.onnx || true && PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder . --output_folder . --mode fp16
stat -f "%z" model_fp16.onnx
Qwen2-VL Test
op_block_list
worksrm -rf onnx/*_*_*.onnx
PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder ./onnx --output_folder ./onnx-dest --mode q4f16
python3 infer.py Qwen/Qwen2-VL-2B-Instruct ./onnx
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from ./onnx/QwenVL_A_q4f16.onnx failed:Type Error: Type parameter (T) of Optype (Sub) bound to different types (tensor(float) and tensor(float16) in node (/Sub).
rm -rf onnx/*_*_*.onnx
PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder ./onnx --output_folder ./onnx --mode q4f16 --op_block_list Conv DynamicQuantizeLinear DequantizeLinear Resize
python3 infer.py Qwen/Qwen2-VL-2B-Instruct ./onnx
The image shows a vintage teal-colored...