Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Does NNI ModelSpeedupTensorRT support Encoder-Decoder models? #5801

Open
donjuanpond opened this issue Aug 2, 2024 · 1 comment
Open

Does NNI ModelSpeedupTensorRT support Encoder-Decoder models? #5801

donjuanpond opened this issue Aug 2, 2024 · 1 comment

Comments

@donjuanpond
Copy link

donjuanpond commented Aug 2, 2024

Question:
I have an encoder decoder model, quantized using TensorRT's packages for post-training quantization. It is in the HuggingFace transformers saved model format. The model is a TrOCR model, which is implemented with the HuggingFace VisionEncoderDecoder class. With Transformers, the encoder and decoder are in a single file, but when saving to ONNX format, the encoder and decoder become two different onnx files.

I am trying to run this model through ModelSpeedupTensorRT, using the tutorial here: https://nni.readthedocs.io/en/stable/tutorials/quantization_speedup.html. When I tried to do engine.compress_with_calibrator(calib) with a calibrator I made from a dataloader, I had an error where my CPU RAM was being taken up by the conversion to ONNX format for some reason. To solve this, I had to convert the model myself, using the HuggingFace Optimum interface for ONNX Runtime.

When editing the source code to accomodate for this, I found the implementation of the build_engine_with_calib() method being called by compress_with_calibrator():

def build_engine_with_calib(onnx_model_file, calib, input_shape):
    """
    Parameters
    ----------
    """
    builder = trt.Builder(TRT_LOGGER)
    network = builder.create_network(common.explicit_batch())
    trt_config = builder.create_builder_config()
    parser = trt.OnnxParser(network, TRT_LOGGER)

    builder.max_batch_size = input_shape[0]
    trt_config.max_workspace_size = common.GiB(8)
    trt_config.set_flag(trt.BuilderFlag.INT8)
    trt_config.set_flag(trt.BuilderFlag.FP16)
    trt_config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)
    trt_config.int8_calibrator = calib

    with open(onnx_model_file, 'rb') as model:
        if not parser.parse(model.read()):
            for error in range(parser.num_errors):
                TRT_LOGGER.log(TRT_LOGGER.ERROR, parser.get_error(error))
            raise ValueError('Failed to parse the ONNX file.')

    TRT_LOGGER.log(TRT_LOGGER.INFO, f'input number: {network.num_inputs}')
    TRT_LOGGER.log(TRT_LOGGER.INFO, f'output number: {network.num_outputs}')

    profile = builder.create_optimization_profile()
    input_name = network.get_input(0).name
    profile.set_shape(input_name, min=input_shape, opt=input_shape, max=input_shape)
    trt_config.add_optimization_profile(profile)

    config_network_to_int8(network) # not sure whether it is necessary because trt.BuilderFlag.INT8 is set.

    engine = builder.build_engine(network, trt_config)
    return engine

I noticed here that the ONNX model is being read as a single file, not from a directory. Because of this, will my Vision Encoder Decoder model not work with the ModelSpeedup, as it is saved as two different files?? Is there any way for me to make it work??

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@donjuanpond and others