-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Use int_scaled_matmul
with int8_dynamic_activation_int8_weight(act_mapping_type=MappingType.ASYMMETRIC)
#1402
base: main
Are you sure you want to change the base?
Changes from 6 commits
352211b
082a565
a434dca
19ce3b8
cd51793
9bc2f3e
9cc7ebd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -888,14 +888,19 @@ def _choose_qparams_affine( | |
"preserve_zero == False is not supported for symmetric quantization" | ||
) | ||
if ( | ||
zero_point_domain is not None | ||
zero_point_domain != ZeroPointDomain.NONE.name | ||
and zero_point_domain != None | ||
and zero_point_domain != ZeroPointDomain.INT.name | ||
): | ||
raise ValueError( | ||
"zero_point_domain != ZeroPointDomain.INT is not supported for symmetric quantization" | ||
"Except a None value for zero_point_domain, Only ZeroPointDomain.NONE and ZeroPointDomain.INT" | ||
" are supported for symmetric quantization." | ||
) | ||
if zero_point_domain == ZeroPointDomain.NONE.name: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for the fix, looks like this is not tested before. can you add a test for the new code path? also this op is becoming too complicated..we want to split There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This case is being tested in a UT I added in
Please advise if you're referring to splitting There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah I meant splitting choose_qparams_affine/quantize_affine/dequantize, not to smaller methods, but to different variations and reduce the complexity of the most common path (and remove these if/else checking), this includes removing preserve_zero, zero_point_domain args and just have different variations of choose_qparams_affine/quantize_affine/dequantize. this should be done separately though since it will be a large change |
||
zero_point = None | ||
else: | ||
zero_point = torch.full_like(scale, int((quant_max + quant_min + 1) / 2)) | ||
scale = torch.clamp(scale, min=eps) | ||
zero_point = torch.full_like(scale, int((quant_max + quant_min + 1) / 2)) | ||
else: | ||
assert mapping_type == MappingType.ASYMMETRIC.name | ||
scale = (max_val_pos - min_val_neg) / float(quant_max - quant_min) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we can probably remove support for None since it's the same as ZeroPointDomain.NONE.name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for reviewing!
Some other places in the codebase are also using both
ZeroPointDomain.NONE.name
and None separately:ao/torchao/quantization/quant_primitives.py
Line 543 in 31234db
I unified these two cases as one in the latest commit, but I'm not sure if changes in
__tensor_unflatten__
&__tensor_flatten__
methods of some classes may be required at some other places in the codebase to ensure that they can deal with aNone
zero-point when TorchDynamo would be used . I'll run CUDA-only UTs at my end tomorrow morning to verify.EDIT: Haven't gotten access to an Nvidia GPU until now