You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
Question:
I have an encoder decoder model, quantized using TensorRT's packages for post-training quantization. It is in the HuggingFace transformers saved model format. The model is a TrOCR model, which is implemented with the HuggingFace VisionEncoderDecoder class. With Transformers, the encoder and decoder are in a single file, but when saving to ONNX format, the encoder and decoder become two different onnx files.
I am trying to run this model through ModelSpeedupTensorRT, using the tutorial here: https://nni.readthedocs.io/en/stable/tutorials/quantization_speedup.html. When I tried to do engine.compress_with_calibrator(calib) with a calibrator I made from a dataloader, I had an error where my CPU RAM was being taken up by the conversion to ONNX format for some reason. To solve this, I had to convert the model myself, using the HuggingFace Optimum interface for ONNX Runtime.
When editing the source code to accomodate for this, I found the implementation of the build_engine_with_calib() method being called by compress_with_calibrator():
defbuild_engine_with_calib(onnx_model_file, calib, input_shape):
""" Parameters ---------- """builder=trt.Builder(TRT_LOGGER)
network=builder.create_network(common.explicit_batch())
trt_config=builder.create_builder_config()
parser=trt.OnnxParser(network, TRT_LOGGER)
builder.max_batch_size=input_shape[0]
trt_config.max_workspace_size=common.GiB(8)
trt_config.set_flag(trt.BuilderFlag.INT8)
trt_config.set_flag(trt.BuilderFlag.FP16)
trt_config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)
trt_config.int8_calibrator=calibwithopen(onnx_model_file, 'rb') asmodel:
ifnotparser.parse(model.read()):
forerrorinrange(parser.num_errors):
TRT_LOGGER.log(TRT_LOGGER.ERROR, parser.get_error(error))
raiseValueError('Failed to parse the ONNX file.')
TRT_LOGGER.log(TRT_LOGGER.INFO, f'input number: {network.num_inputs}')
TRT_LOGGER.log(TRT_LOGGER.INFO, f'output number: {network.num_outputs}')
profile=builder.create_optimization_profile()
input_name=network.get_input(0).nameprofile.set_shape(input_name, min=input_shape, opt=input_shape, max=input_shape)
trt_config.add_optimization_profile(profile)
config_network_to_int8(network) # not sure whether it is necessary because trt.BuilderFlag.INT8 is set.engine=builder.build_engine(network, trt_config)
returnengine
I noticed here that the ONNX model is being read as a single file, not from a directory. Because of this, will my Vision Encoder Decoder model not work with the ModelSpeedup, as it is saved as two different files?? Is there any way for me to make it work??
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Question:
I have an encoder decoder model, quantized using TensorRT's packages for post-training quantization. It is in the HuggingFace transformers saved model format. The model is a TrOCR model, which is implemented with the HuggingFace VisionEncoderDecoder class. With Transformers, the encoder and decoder are in a single file, but when saving to ONNX format, the encoder and decoder become two different onnx files.
I am trying to run this model through ModelSpeedupTensorRT, using the tutorial here: https://nni.readthedocs.io/en/stable/tutorials/quantization_speedup.html. When I tried to do
engine.compress_with_calibrator(calib)
with a calibrator I made from a dataloader, I had an error where my CPU RAM was being taken up by the conversion to ONNX format for some reason. To solve this, I had to convert the model myself, using the HuggingFace Optimum interface for ONNX Runtime.When editing the source code to accomodate for this, I found the implementation of the
build_engine_with_calib()
method being called bycompress_with_calibrator()
:I noticed here that the ONNX model is being read as a single file, not from a directory. Because of this, will my Vision Encoder Decoder model not work with the ModelSpeedup, as it is saved as two different files?? Is there any way for me to make it work??
The text was updated successfully, but these errors were encountered: