Quantization produces large scale coffiecient, which pervents the model from being loaded #62196

FabianSchuetze · 2023-10-23T14:01:58Z

1. System information

Colab , as of 2023-10-23

2. Code

Please see the attached colab notebook here
https://colab.research.google.com/drive/1yUD0nDu8oeeDtQBa7xCbQWx_w8PxS4UC?usp=sharing
to reproduce the issue. It loads a pre-trained resnet18 from pytorch, converts it to onnx, converts it to tensorflow, and then exports it to tf-lite. ( The process is a bit convoluted, but I need a pretrained resnet18, and didn't find it in the tensorflow orbit so I used torchvision, hope that's ok.)

If you download the generated model (model_int8.tflite) and open it in netron.app and click on the first MaxPool2D op, you can see that the quantization scale is 1.3344405750530544e+36. See the attached image.

This scale parameter itself is of course implausible (impossible), but loading the model also produces an error here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/internal/quantization_util.cc#L117

Does anybody know why the quantization parameter is that high, and what can be done to fix it? Furthermore, can I let the quantization fails explicitly when it generates such high values?

The text was updated successfully, but these errors were encountered:

pjpratik · 2023-11-01T13:11:53Z

I was able to reproduce this issue. Please find this gist. Seems to be issue while converting pytorch->onnx->tf->tflite.

@pkgoogle Could you please check this?

Thanks.

FabianSchuetze · 2023-11-01T14:06:37Z

Thanks for verifying, @pjpratik and thanks for checking @pkgoogle .

pkgoogle · 2023-11-03T22:21:19Z

I was able to reproduce w/ the same gist as above. @abattery, Can you please take a look? Thanks.

FabianSchuetze · 2023-11-05T17:15:47Z

If you accept a PR and might review it @abattery, I can also look into this again. Would be nice to identify the source of this bug and also to do a PR for tf.

pkgoogle · 2023-11-06T18:29:32Z

Hi @FabianSchuetze, we will review any PR which may come in. For this issue, I'm not sure where the root issue is yet but I can give you some guesses or things to look at. At a high level you should probably start with this README: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/lite/README.md and then probably review the passes: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/lite/tf_tfl_passes.cc. We use MLIR to convert arbitrary compute graphs. Hope that at least gets you started ...

FabianSchuetze · 2023-11-09T07:45:06Z

Ok, I've started working on this and your comments were very helpful indeed, @pkgoogle - thanks.

FabianSchuetze · 2023-11-10T11:16:54Z

Hi @pkgoogle

Thanks for your help. I've been able to reproduce the quantization problem with the tfl_quantizer binary. The binary loads an annotated model and returns the quantized output (bazel-bin/tensorflow/compiler/mlir/lite/quantization/lite/tfl_quantizer --debug ANNOTATED.TFLITE > QUANTIZED.TFLITE 2>DEBUG). The DEBUG file is extremely large (33GB) because it contains long hex numbers regularly, such as:

%cst_36 = arith.constant dense<"0xB3C0153CF426D73B24C8233CE29033BB4F10F33C42F6A53C....

If I remove these hex numbers, the DEBUG file is manageable. It contains a few suspicious values with seemingly erroneous quantization parameters, such as:

tfl.padv2 : (f32,i32,f32,) -> (!quant.uniform<i8:f32, 1.3344405750530544E+36:127>,)

but I'm not sure whether this is relevant and how to debug it. Otherwise, I also jumped into the quantization_driver.cc functions but could not identify anything relevant. Do you have any suggestions for narrowing the search or creating good debug output? (NB: I have compiled tensorflow/compiler/mlir/lite/quantization/lite:tfl_quantizer in debug modus with the llvm supplied by tensorflow.)

pkgoogle · 2023-11-10T22:58:56Z

Hi @FabianSchuetze I would try to use lldb/gdb with that binary (and build with debug info) and see if you can break down the conversion step by step (and try to figure out before/after which pass does that weird scaling take root). It might be worth it to see if you can reduce down your model to as few as ops as possible that will still show the issue.

Have you understood the Operation, Region, Block abstractions of MLIR? It's probably worth it to understand that abstraction first: https://www.youtube.com/watch?v=Y4SvqTtOIDk

Then you can start with the operation which represents your model, dump it https://mlir.llvm.org/doxygen/classmlir_1_1Operation.html (Op->dump();) within lldb/gdb, after every conversion redump it when possible etc. That should help you drill down a bit.

Probably not useful immediately but you may also want to review some other useful MLIR binaries:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/lite/BUILD
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/BUILD

(search for tf_cc_binary's in those files)

I'm not an expert in those but they might be able to help in ways I can't see right now. Hope that helps.

FabianSchuetze · 2023-11-12T17:21:57Z

Thanks for your comment, @pkgoogle . I made a bit of progress and think I'm on a good way.

FabianSchuetze · 2023-11-19T17:25:01Z

Hi @pkgoogle . Just a brief update:

The segfault can be prevented by ensuring the scale coefficient is smaller than one. The scale coefficient can be restricted in FakeQuantSupport.cc or, in case that the legacy float scale is enabled, in DownCastScale.

I will now try to identify an appropriate location for interrupting the quantization process. During the quantization, several checks to ensure that the quantization parameters are within an admissible range are called. These checks might be an opportunity to interrupt the transformation.

Thanks also for your support regarding MLIR. It was a pleasure to work with it so far.

pkgoogle · 2023-11-20T19:23:21Z

Hi @FabianSchuetze, Thanks for your help! That's really good progress, np, feel free to ask any more questions -- I don't know everything but I will try my best to help. MLIR has a huge activation energy so to speak but it's a pleasure once you get used to it, so it's great to see you were able to make progress.

FabianSchuetze · 2023-12-09T16:20:34Z

@pkgoogle : Thanks again for your help. I created a PR with a proposed solution. Maybe we can continue the discussion there

gaikwadrahul8 · 2024-11-27T09:05:13Z

Hi, @FabianSchuetze

Thanks for raising this issue. Are you aware of the migration to LiteRT? This transition is aimed at enhancing our project's capabilities and providing improved support and focus for our users. As we believe this issue is still relevant to LiteRT we are moving your issue there. Please follow progress here: google-ai-edge/ai-edge-torch#390

Let us know if you have any questions. Thanks.

google-ml-butler · 2024-11-27T18:30:42Z

Are you satisfied with the resolution of your issue?
Yes
No

FabianSchuetze added the TFLiteConverter For issues related to TFLite converter label Oct 23, 2023

google-ml-butler bot assigned tilakrayal Oct 23, 2023

tilakrayal added TF 2.13 For issues related to Tensorflow 2.13 type:bug Bug comp:lite TF Lite related issues labels Oct 26, 2023

tilakrayal assigned pjpratik and unassigned tilakrayal Oct 26, 2023

pjpratik assigned pkgoogle and unassigned pjpratik Nov 1, 2023

pkgoogle added the ModelOptimizationToolkit TF Model Optimization Toolkit label Nov 2, 2023

pkgoogle assigned abattery Nov 3, 2023

pkgoogle added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 3, 2023

FabianSchuetze mentioned this issue Dec 9, 2023

QuantOps verifier asserts that quant scale is below 1e+10 #62604

Closed

FabianSchuetze mentioned this issue Dec 9, 2023

Quantization scale verification to prevent spurious tf-tfl conversion #62605

Draft

pkgoogle closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization produces large scale coffiecient, which pervents the model from being loaded #62196

Quantization produces large scale coffiecient, which pervents the model from being loaded #62196

FabianSchuetze commented Oct 23, 2023

pjpratik commented Nov 1, 2023

FabianSchuetze commented Nov 1, 2023

pkgoogle commented Nov 3, 2023

FabianSchuetze commented Nov 5, 2023

pkgoogle commented Nov 6, 2023

FabianSchuetze commented Nov 9, 2023

FabianSchuetze commented Nov 10, 2023

pkgoogle commented Nov 10, 2023

FabianSchuetze commented Nov 12, 2023

FabianSchuetze commented Nov 19, 2023

pkgoogle commented Nov 20, 2023

FabianSchuetze commented Dec 9, 2023 •

edited

Loading

gaikwadrahul8 commented Nov 27, 2024

google-ml-butler bot commented Nov 27, 2024

Quantization produces large scale coffiecient, which pervents the model from being loaded #62196

Quantization produces large scale coffiecient, which pervents the model from being loaded #62196

Comments

FabianSchuetze commented Oct 23, 2023

1. System information

2. Code

pjpratik commented Nov 1, 2023

FabianSchuetze commented Nov 1, 2023

pkgoogle commented Nov 3, 2023

FabianSchuetze commented Nov 5, 2023

pkgoogle commented Nov 6, 2023

FabianSchuetze commented Nov 9, 2023

FabianSchuetze commented Nov 10, 2023

pkgoogle commented Nov 10, 2023

FabianSchuetze commented Nov 12, 2023

FabianSchuetze commented Nov 19, 2023

pkgoogle commented Nov 20, 2023

FabianSchuetze commented Dec 9, 2023 • edited Loading

gaikwadrahul8 commented Nov 27, 2024

google-ml-butler bot commented Nov 27, 2024

FabianSchuetze commented Dec 9, 2023 •

edited

Loading