Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization produces large scale coffiecient, which pervents the model from being loaded #62196

Closed
FabianSchuetze opened this issue Oct 23, 2023 · 14 comments
Assignees
Labels
comp:lite TF Lite related issues ModelOptimizationToolkit TF Model Optimization Toolkit stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.13 For issues related to Tensorflow 2.13 TFLiteConverter For issues related to TFLite converter type:bug Bug

Comments

@FabianSchuetze
Copy link

1. System information

Colab , as of 2023-10-23

2. Code

Please see the attached colab notebook here
https://colab.research.google.com/drive/1yUD0nDu8oeeDtQBa7xCbQWx_w8PxS4UC?usp=sharing
to reproduce the issue. It loads a pre-trained resnet18 from pytorch, converts it to onnx, converts it to tensorflow, and then exports it to tf-lite. ( The process is a bit convoluted, but I need a pretrained resnet18, and didn't find it in the tensorflow orbit so I used torchvision, hope that's ok.)

If you download the generated model (model_int8.tflite) and open it in netron.app and click on the first MaxPool2D op, you can see that the quantization scale is 1.3344405750530544e+36. See the attached image.

image

This scale parameter itself is of course implausible (impossible), but loading the model also produces an error here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/internal/quantization_util.cc#L117

Does anybody know why the quantization parameter is that high, and what can be done to fix it? Furthermore, can I let the quantization fails explicitly when it generates such high values?

@FabianSchuetze FabianSchuetze added the TFLiteConverter For issues related to TFLite converter label Oct 23, 2023
@tilakrayal tilakrayal added TF 2.13 For issues related to Tensorflow 2.13 type:bug Bug comp:lite TF Lite related issues labels Oct 26, 2023
@tilakrayal tilakrayal assigned pjpratik and unassigned tilakrayal Oct 26, 2023
@pjpratik
Copy link
Contributor

pjpratik commented Nov 1, 2023

I was able to reproduce this issue. Please find this gist. Seems to be issue while converting pytorch->onnx->tf->tflite.

@pkgoogle Could you please check this?

Thanks.

@pjpratik pjpratik assigned pkgoogle and unassigned pjpratik Nov 1, 2023
@FabianSchuetze
Copy link
Author

Thanks for verifying, @pjpratik and thanks for checking @pkgoogle .

@pkgoogle pkgoogle added the ModelOptimizationToolkit TF Model Optimization Toolkit label Nov 2, 2023
@pkgoogle
Copy link

pkgoogle commented Nov 3, 2023

I was able to reproduce w/ the same gist as above. @abattery, Can you please take a look? Thanks.

@pkgoogle pkgoogle added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 3, 2023
@FabianSchuetze
Copy link
Author

If you accept a PR and might review it @abattery, I can also look into this again. Would be nice to identify the source of this bug and also to do a PR for tf.

@pkgoogle
Copy link

pkgoogle commented Nov 6, 2023

Hi @FabianSchuetze, we will review any PR which may come in. For this issue, I'm not sure where the root issue is yet but I can give you some guesses or things to look at. At a high level you should probably start with this README: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/lite/README.md and then probably review the passes: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/lite/tf_tfl_passes.cc. We use MLIR to convert arbitrary compute graphs. Hope that at least gets you started ...

@FabianSchuetze
Copy link
Author

Ok, I've started working on this and your comments were very helpful indeed, @pkgoogle - thanks.

@FabianSchuetze
Copy link
Author

Hi @pkgoogle

Thanks for your help. I've been able to reproduce the quantization problem with the tfl_quantizer binary. The binary loads an annotated model and returns the quantized output (bazel-bin/tensorflow/compiler/mlir/lite/quantization/lite/tfl_quantizer --debug ANNOTATED.TFLITE > QUANTIZED.TFLITE 2>DEBUG). The DEBUG file is extremely large (33GB) because it contains long hex numbers regularly, such as:

%cst_36 = arith.constant dense<"0xB3C0153CF426D73B24C8233CE29033BB4F10F33C42F6A53C....

If I remove these hex numbers, the DEBUG file is manageable. It contains a few suspicious values with seemingly erroneous quantization parameters, such as:

tfl.padv2 : (f32,i32,f32,) -> (!quant.uniform<i8:f32, 1.3344405750530544E+36:127>,)

but I'm not sure whether this is relevant and how to debug it. Otherwise, I also jumped into the quantization_driver.cc functions but could not identify anything relevant. Do you have any suggestions for narrowing the search or creating good debug output? (NB: I have compiled tensorflow/compiler/mlir/lite/quantization/lite:tfl_quantizer in debug modus with the llvm supplied by tensorflow.)

@pkgoogle
Copy link

Hi @FabianSchuetze I would try to use lldb/gdb with that binary (and build with debug info) and see if you can break down the conversion step by step (and try to figure out before/after which pass does that weird scaling take root). It might be worth it to see if you can reduce down your model to as few as ops as possible that will still show the issue.

Have you understood the Operation, Region, Block abstractions of MLIR? It's probably worth it to understand that abstraction first: https://www.youtube.com/watch?v=Y4SvqTtOIDk

Then you can start with the operation which represents your model, dump it https://mlir.llvm.org/doxygen/classmlir_1_1Operation.html (Op->dump();) within lldb/gdb, after every conversion redump it when possible etc. That should help you drill down a bit.

Probably not useful immediately but you may also want to review some other useful MLIR binaries:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/lite/BUILD
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/BUILD

(search for tf_cc_binary's in those files)

I'm not an expert in those but they might be able to help in ways I can't see right now. Hope that helps.

@FabianSchuetze
Copy link
Author

Thanks for your comment, @pkgoogle . I made a bit of progress and think I'm on a good way.

@FabianSchuetze
Copy link
Author

Hi @pkgoogle . Just a brief update:

The segfault can be prevented by ensuring the scale coefficient is smaller than one. The scale coefficient can be restricted in FakeQuantSupport.cc or, in case that the legacy float scale is enabled, in DownCastScale.

I will now try to identify an appropriate location for interrupting the quantization process. During the quantization, several checks to ensure that the quantization parameters are within an admissible range are called. These checks might be an opportunity to interrupt the transformation.

Thanks also for your support regarding MLIR. It was a pleasure to work with it so far.

@pkgoogle
Copy link

Hi @FabianSchuetze, Thanks for your help! That's really good progress, np, feel free to ask any more questions -- I don't know everything but I will try my best to help. MLIR has a huge activation energy so to speak but it's a pleasure once you get used to it, so it's great to see you were able to make progress.

@FabianSchuetze
Copy link
Author

FabianSchuetze commented Dec 9, 2023

@pkgoogle : Thanks again for your help. I created a PR with a proposed solution. Maybe we can continue the discussion there

@gaikwadrahul8
Copy link
Contributor

Hi, @FabianSchuetze

Thanks for raising this issue. Are you aware of the migration to LiteRT? This transition is aimed at enhancing our project's capabilities and providing improved support and focus for our users. As we believe this issue is still relevant to LiteRT we are moving your issue there. Please follow progress here: google-ai-edge/ai-edge-torch#390

Let us know if you have any questions. Thanks.

@pkgoogle pkgoogle closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2024
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues ModelOptimizationToolkit TF Model Optimization Toolkit stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.13 For issues related to Tensorflow 2.13 TFLiteConverter For issues related to TFLite converter type:bug Bug
Projects
None yet
Development

No branches or pull requests

6 participants