-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorRT Optimization #293
Comments
No |
StyleTTS2 consists of several models and components that work together to generate audio. To optimize it using TensorRT, you first need to convert each model separately from PyTorch to ONNX. Once converted, you can either:
From my experience with ablation studies, the decoder is the most resource-intensive component in StyleTTS2. If you aim for partial optimization, converting just the decoder from PyTorch to ONNX and running it in TensorRT can provide significant speed improvements. Alternatively, converting all models to ONNX and running them in ONNX Runtime with the TensorRT execution provider will also yield noticeable performance gains. This approach is feasible and I have tested it. |
HI @UmerrAhsan. Could you please share the overall latencies of your ONNX model? |
Hi @nityanandmathur. I have ran the decoder model and predictor.text encoder model in tensorrt. It decreases my latency by over 50%. Also I have cached the style vectors from diffusion and style encoder before. After that, a single short sentence runs in under 100ms. |
Is there any TensorRT optimised inference available?
The text was updated successfully, but these errors were encountered: