Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT Optimization #293

Open
nityanandmathur opened this issue Nov 6, 2024 · 4 comments
Open

TensorRT Optimization #293

nityanandmathur opened this issue Nov 6, 2024 · 4 comments

Comments

@nityanandmathur
Copy link

Is there any TensorRT optimised inference available?

@78Alpha
Copy link

78Alpha commented Nov 10, 2024

No

@UmerrAhsan
Copy link

UmerrAhsan commented Dec 12, 2024

StyleTTS2 consists of several models and components that work together to generate audio. To optimize it using TensorRT, you first need to convert each model separately from PyTorch to ONNX.

Once converted, you can either:

  • Run the ONNX models using ONNX Runtime with the TensorRT execution provider, or
  • Convert the ONNX models directly into TensorRT format and perform inference using TensorRT's Python or C++ API.

From my experience with ablation studies, the decoder is the most resource-intensive component in StyleTTS2. If you aim for partial optimization, converting just the decoder from PyTorch to ONNX and running it in TensorRT can provide significant speed improvements. Alternatively, converting all models to ONNX and running them in ONNX Runtime with the TensorRT execution provider will also yield noticeable performance gains. This approach is feasible and I have tested it.

@nityanandmathur
Copy link
Author

HI @UmerrAhsan. Could you please share the overall latencies of your ONNX model?

@UmerrAhsan
Copy link

UmerrAhsan commented Dec 12, 2024

Hi @nityanandmathur. I have ran the decoder model and predictor.text encoder model in tensorrt. It decreases my latency by over 50%. Also I have cached the style vectors from diffusion and style encoder before. After that, a single short sentence runs in under 100ms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants