Triton Inference Server is officially supported on JetPack starting from JetPack 4.6. Triton Inference Server on Jetson supports trained AI models from multiple frameworks includings NVIDIA TensorRT, TensorFlow and ONNX Runtime.
On JetPack, although HTTP/REST and GRPC inference protocols are supported, for edge use cases, direct C API integration is recommended.
Triton Inference Server support on JetPack includes:
- Running models on GPU and NVDLA
- Support for multiple frameworks: TensorRT, TensorFlow and ONNX Runtime.
- Concurrent model execution
- Dynamic batching
- Model pipelines
- Extensible backends
- HTTP/REST and GRPC inference protocols
- C API
You can download the .tar
files for Jetson published on the Triton Infence Server release page in "Jetson JetPack Support" section. The .tar
file contains the Triton executables and shared libraries, as well as the C++ and Python client libraries and examples.
Note that perf_analyzer is supported on Jetson, while the model_analyzer is currently not available for Jetson. To execute perf_analyzer
for C API, include the option --service-kind=triton_c_api
:
perf_analyzer -m graphdef_int32_int32_int32 --service-kind=triton_c_api --triton-server-directory=/opt/tritonserver --model-repository=/workspace/qa/L0_perf_analyzer_capi/models