-
Notifications
You must be signed in to change notification settings - Fork 74.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantization produces large scale coffiecient, which pervents the model from being loaded #62196
Comments
I was able to reproduce w/ the same gist as above. @abattery, Can you please take a look? Thanks. |
If you accept a PR and might review it @abattery, I can also look into this again. Would be nice to identify the source of this bug and also to do a PR for tf. |
Hi @FabianSchuetze, we will review any PR which may come in. For this issue, I'm not sure where the root issue is yet but I can give you some guesses or things to look at. At a high level you should probably start with this README: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/lite/README.md and then probably review the passes: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/lite/tf_tfl_passes.cc. We use MLIR to convert arbitrary compute graphs. Hope that at least gets you started ... |
Ok, I've started working on this and your comments were very helpful indeed, @pkgoogle - thanks. |
Hi @pkgoogle Thanks for your help. I've been able to reproduce the quantization problem with the
If I remove these hex numbers, the DEBUG file is manageable. It contains a few suspicious values with seemingly erroneous quantization parameters, such as:
but I'm not sure whether this is relevant and how to debug it. Otherwise, I also jumped into the |
Hi @FabianSchuetze I would try to use lldb/gdb with that binary (and build with debug info) and see if you can break down the conversion step by step (and try to figure out before/after which pass does that weird scaling take root). It might be worth it to see if you can reduce down your model to as few as ops as possible that will still show the issue. Have you understood the Operation, Region, Block abstractions of MLIR? It's probably worth it to understand that abstraction first: https://www.youtube.com/watch?v=Y4SvqTtOIDk Then you can start with the operation which represents your model, dump it https://mlir.llvm.org/doxygen/classmlir_1_1Operation.html (Op->dump();) within lldb/gdb, after every conversion redump it when possible etc. That should help you drill down a bit. Probably not useful immediately but you may also want to review some other useful MLIR binaries: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/lite/BUILD (search for tf_cc_binary's in those files) I'm not an expert in those but they might be able to help in ways I can't see right now. Hope that helps. |
Thanks for your comment, @pkgoogle . I made a bit of progress and think I'm on a good way. |
Hi @pkgoogle . Just a brief update: The segfault can be prevented by ensuring the scale coefficient is smaller than one. The scale coefficient can be restricted in FakeQuantSupport.cc or, in case that the legacy float scale is enabled, in DownCastScale. I will now try to identify an appropriate location for interrupting the quantization process. During the quantization, several checks to ensure that the quantization parameters are within an admissible range are called. These checks might be an opportunity to interrupt the transformation. Thanks also for your support regarding MLIR. It was a pleasure to work with it so far. |
Hi @FabianSchuetze, Thanks for your help! That's really good progress, np, feel free to ask any more questions -- I don't know everything but I will try my best to help. MLIR has a huge activation energy so to speak but it's a pleasure once you get used to it, so it's great to see you were able to make progress. |
Hi, @FabianSchuetze Thanks for raising this issue. Are you aware of the migration to LiteRT? This transition is aimed at enhancing our project's capabilities and providing improved support and focus for our users. As we believe this issue is still relevant to LiteRT we are moving your issue there. Please follow progress here: google-ai-edge/ai-edge-torch#390 Let us know if you have any questions. Thanks. |
1. System information
Colab , as of 2023-10-23
2. Code
Please see the attached colab notebook here
https://colab.research.google.com/drive/1yUD0nDu8oeeDtQBa7xCbQWx_w8PxS4UC?usp=sharing
to reproduce the issue. It loads a pre-trained resnet18 from pytorch, converts it to onnx, converts it to tensorflow, and then exports it to tf-lite. ( The process is a bit convoluted, but I need a pretrained resnet18, and didn't find it in the tensorflow orbit so I used torchvision, hope that's ok.)
If you download the generated model (model_int8.tflite) and open it in netron.app and click on the first MaxPool2D op, you can see that the quantization scale is
1.3344405750530544e+36
. See the attached image.This scale parameter itself is of course implausible (impossible), but loading the model also produces an error here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/internal/quantization_util.cc#L117
Does anybody know why the quantization parameter is that high, and what can be done to fix it? Furthermore, can I let the quantization fails explicitly when it generates such high values?
The text was updated successfully, but these errors were encountered: