This section describes how to build the Triton server from source. For information on building the Triton client libraries and examples see Client Libraries and Examples. For information on building the Triton SDK container see Build SDK Image. For information on testing your Triton build see Testing Triton.
You can create a customized Triton Docker image that contains a subset of the released backends without building from source. For example, you may want a Triton image that contains only the TensorRT and Python backends. For this type of customization you don't need to build Triton from source and instead can use the compose utility.
The Triton source is distributed across multiple GitHub repositories that together can be built and installed to create a complete Triton installation. Triton server is built using CMake and (optionally) Docker. To simplify the build process, Triton provides a build.py script. The build.py script will generate the CMake and Docker build steps required to build Triton, and will optionally invoke those steps or leave the invocation to you, as described below.
The build.py script currently supports building Triton for the following platforms. See Building on Unsupported Platforms if you are attempting to build Triton on a platform that is not listed here.
If you are developing or debugging Triton, see Development and Incremental Builds for information on how to perform incremental build.
For Ubuntu-22.04, build.py supports both a Docker build and a non-Docker build.
-
Build using Docker and the TensorFlow and PyTorch Docker images from NVIDIA GPU Cloud (NGC).
The easiest way to build Triton is to use Docker. The result of the build will be a Docker image called tritonserver that will contain the tritonserver executable in /opt/tritonserver/bin and the required shared libraries in /opt/tritonserver/lib. The backends and repository-agents built for Triton will be in /opt/tritonserver/backends and /opt/tritonserver/repoagents, respectively.
The first step for the build is to clone the triton-inference-server/server repo branch for the release you are interested in building (or the main branch to build from the development branch). Then run build.py as described below. The build.py script performs these steps when building with Docker.
-
In the build subdirectory of the server repo, generate the docker_build script, the cmake_build script and the Dockerfiles needed to build Triton. If you use the --dryrun flag, build.py will stop here so that you can examine these files.
-
Run the docker_build script to perform the Docker-based build. The docker_build script performs the following steps.
-
Build the tritonserver_buildbase Docker image that collects all the build dependencies needed to build Triton. The tritonserver_buildbase image is based on a minimal/base image. When building with GPU support (--enable-gpu), the min image is the <xx.yy>-py3-min image pulled from NGC that contains the CUDA, cuDNN, TensorRT and other dependencies that are required to build Triton. When building without GPU support, the min image is the standard ubuntu:22.04 image.
-
Run the cmake_build script within the tritonserver_buildbase image to actually build Triton. The cmake_build script performs the following steps.
-
Invoke CMake in the server repo to build Triton's core shared library and tritonserver executable.
-
Clone each requested backend and build it using CMake. For example, the ONNX Runtime backend is built using triton-inference-server/onnxruntime_backend/CMakeLists.txt. Some of the backends may use Docker as part of their build (for example ONNX Runtime and OpenVINO). If you don't want to use Docker in those cases you must consult the build process for those backends.
-
Clone each repository agent and build it using the CMake file from the corresponding repo. For example, the Checksum repository agent is built using triton-inference-server/checksum_repository_agent/CMakeLists.txt.
-
-
Copy the built artifacts out of the container and into the build subdirectory on the host system.
-
Create the final tritonserver Docker image that contains the libraries, executables and other artifacts from the build.
-
Create a tritonserver_cibase Docker image that contains the QA artifacts needed for testing, as described in Testing Triton.
-
By default, build.py does not enable any of Triton's optional features but you can enable all features, backends, and repository agents with the --enable-all flag. The -v flag turns on verbose output.
$ ./build.py -v --enable-all
If you want to enable only certain Triton features, backends and repository agents, do not specify --enable-all. Instead you must specify the individual flags as documented by --help.
As described above, the build is performed in the server repo, but source from several other repos is fetched during the build process. Typically you do not need to specify anything about these other repos, but if you want to control which branch is used in these other repos you can as shown in the following example.
$ ./build.py ... --repo-tag=common:<container tag> --repo-tag=core:<container tag> --repo-tag=backend:<container tag> --repo-tag=thirdparty:<container tag> ... --backend=tensorrt:<container tag> ... --repoagent=checksum:<container tag> ...
If you are building on a release branch then <container tag>
will
default to the branch name. For example, if you are building on the
r24.05 branch, <container tag>
will default to r24.05. If you are
building on any other branch (including the main branch) then
<container tag>
will default to "main". Therefore, you typically do
not need to provide <container tag>
at all (nor the preceding
colon). You can use a different <container tag>
for a component to
instead use the corresponding branch/tag in the build. For example, if
you have a branch called "mybranch" in the
onnxruntime_backend
repo that you want to use in the build, you would specify
--backend=onnxruntime:mybranch.
If you want to build without GPU support you must specify individual
feature flags and not include the --enable-gpu
and
--enable-gpu-metrics
flags. Only the following backends are
available for a non-GPU / CPU-only build: identity
, repeat
, ensemble
,
square
, tensorflow2
, pytorch
, onnxruntime
, openvino
,
python
and fil
.
To include the TensorFlow2 backend in your CPU-only build, you must
provide this additional flag to build.py:
--extra-backend-cmake-arg=tensorflow2:TRITON_TENSORFLOW_INSTALL_EXTRA_DEPS=ON
.
CPU-only builds of the TensorFlow and PyTorch backends require some CUDA stubs
and runtime dependencies that are not present in the CPU-only base container.
These are retrieved from a GPU base container, which can be changed with the
--image=gpu-base,nvcr.io/nvidia/tritonserver:<xx.yy>-py3-min
flag.
To build Triton without using Docker you must install the build dependencies that are handled automatically when building with Docker.
The first step for the build is to clone the triton-inference-server/server repo branch for the release you are interested in building (or the main branch to build from the development branch).
To determine what dependencies are required by the build, run build.py with the --dryrun flag, and then looking in the build subdirectory at Dockerfile.buildbase.
$ ./build.py -v --enable-all
From Dockerfile.buildbase you can see what dependencies you need to install on your host system. Note that when building with --enable-gpu (or --enable-all), Dockerfile.buildbase depends on the <xx.yy>-py3-min image pulled from NGC. Unfortunately, a Dockerfile is not currently available for the <xx.yy>-py3-min image. Instead, you must manually install CUDA and cuDNN and TensorRT dependencies as described below.
Once you have installed these dependencies on your build system you can then use build.py with the --no-container-build flag to build Triton.
$ ./build.py -v --no-container-build --build-dir=`pwd`/build --enable-all
See Building with Docker for more details on how the cmake_build script is used to perform the build.
For Triton to support NVIDIA GPUs you must install CUDA, cuBLAS and cuDNN. These libraries must be installed on the system include and library paths so that they are available for the build. The version of the libraries used for a given release can be found in the Framework Containers Support Matrix.
For a given version of Triton you can attempt to build with non-supported versions of the libraries but you may have build or execution issues since non-supported versions are not tested.
The TensorRT headers and libraries must be installed on system include and library paths so that they are available for the build. The version of TensorRT used in a given release can be found in the Framework Containers Support Matrix.
For a given version of Triton you can attempt to build with non-supported versions of TensorRT but you may have build or execution issues since non-supported versions are not tested.
Under Construction
For Windows 10, build.py supports both a Docker build and a non-Docker build in a similar way as described for Ubuntu. The primary difference is that the minimal/base image used as the base of Dockerfile.buildbase image can be built from the provided Dockerfile.win10.min file as described in Windows 10 "Min" Image. When running build.py use the --image flag to specify the tag that you assigned to this image. For example, --image=base,win10-py3-min.
Depending on your version of Windows 10 and your version of Docker you may need to perform these additional steps before any of the following step.
- Set your Docker to work with "Windows containers". Right click on the whale icon in the lower-right status area and select "Switch to Windows containers".
The "min" container describes the base dependencies needed to perform the Windows build. The Windows min container is Dockerfile.win10.min.
Before building the min container you must download the appropriate cuDNN and TensorRT versions and place them in the same directory as Dockerfile.win10.min.
-
For cuDNN the CUDNN_VERSION and CUDNN_ZIP arguments defined in Dockerfile.win10.min indicate the version of cuDNN that your should download from https://developer.nvidia.com/rdp/cudnn-download.
-
For TensorRT the TENSORRT_VERSION and TENSORRT_ZIP arguments defined in Dockerfile.win10.min indicate the version of TensorRT that your should download from https://developer.nvidia.com/nvidia-tensorrt-download.
After downloading the zip files for cuDNN and TensorRT, you build the min container using the following command.
$ docker build -t win10-py3-min -f Dockerfile.win10.min .
Triton is built using the build.py script. The build system must have Docker, Python3 (plus pip installed docker module) and git installed so that it can execute build.py and perform a docker build. By default, build.py does not enable any of Triton's optional features and so you must enable them explicitly. The following build.py invocation builds all features and backends available on windows.
python build.py --cmake-dir=<path/to/repo>/build --build-dir=/tmp/citritonbuild --no-container-pull --image=base,win10-py3-min --enable-logging --enable-stats --enable-tracing --enable-gpu --endpoint=grpc --endpoint=http --repo-tag=common:<container tag> --repo-tag=core:<container tag> --repo-tag=backend:<container tag> --repo-tag=thirdparty:<container tag> --backend=ensemble --backend=tensorrt:<container tag> --backend=onnxruntime:<container tag> --backend=openvino:<container tag>
If you are building on main branch then '' will default to "main". If you are building on a release branch then '' will default to the branch name. For example, if you are building on the r24.05 branch, '' will default to r24.05. Therefore, you typically do not need to provide '' at all (nor the preceding colon). You can use a different '' for a component to instead use the corresponding branch/tag in the build. For example, if you have a branch called "mybranch" in the onnxruntime_backend repo that you want to use in the build, you would specify --backend=onnxruntime:mybranch.
When build.py completes, a Docker image called tritonserver will contain the built Triton Server executable, libraries and other artifacts. Windows containers do not support GPU access so you likely want to extract the necessary files from the tritonserver image and run them directly on your host system. All the Triton artifacts can be found in /opt/tritonserver directory of the tritonserver image. Your host system will need to install the CUDA, cuDNN, TensorRT and other dependencies that were used for the build.
Building for an unsupported OS and/or hardware platform is possible. All of the build scripting, Dockerfiles and CMake invocations are included in the public repos or are generated by build.py as described in Building with Docker. From these files you can find the required dependencies and CMake invocations. However, due to differences in compilers, libraries, package management, etc. you may have to make changes in the build scripts, Dockerfiles, CMake files and the source code.
To see the generated build scripts and Dockerfiles referred to below, use:
$ ./build.py -v --enable-all --dryrun
You should familiarize yourself with the build process for supported platforms by reading the above documentation and then follow the process for the supported platform that most closely matches the platform you are interested in (for example, if you are trying to build for RHEL/x86-64 then follow the Building for Ubuntu 22.04 process. You will likely need to make changes in the following areas and then manually run docker_build and cmake_build or the equivalent commands to perform a build.
-
The generated Dockerfiles install dependencies for the build using platform-specific packaging tools, for example, apt-get for Ubuntu. You will need to change build.py to use the packaging tool appropriate for your platform.
-
The package and libraries names for your platform may differ from those used by the generated Dockerfiles. You will need to find the corresponding packages on libraries on your platform.
-
Your platform may use a different compiler or compiler version than the support platforms. As a result you may encounter build errors that need to be fixed by editing the source code or changing the compilation flags.
-
Triton depends on a large number of open-source packages that it builds from source. If one of these packages does not support your platform then you may need to disable the Triton feature that depends on that package. For example, Triton supports the S3 filesystem by building the aws-sdk-cpp package. If aws-sdk-cpp doesn't build for your platform then you can remove the need for that package by not specifying --filesystem=s3 when you run build.py. In general, you should start by running build.py with the minimal required feature set.
-
The TensorFlow backend extracts pre-built shared libraries from the TensorFlow NGC container as part of the build. This container is only available for Ubuntu-22.04 / x86-64, so if you require the TensorFlow backend for your platform you will need download the TensorFlow container and modify its build to produce shared libraries for your platform. You must use the TensorFlow source and build scripts from within the NGC container because they contain Triton-specific patches that are required for the Triton TensorFlow backend.
-
By default, the PyTorch backend build extracts pre-built shared libraries from The PyTorch NGC container. But the build can also use PyTorch shared libraries that you build separately for your platform. See the pytorch_backend build process for details.
If you are building without Docker use the CMake invocation steps in cmake_build to invoke CMake to set-up a build environment where you can invoke make/msbuild.exe to incremental build the Triton core, a backend, or a repository agent.
If you are building with Docker, the generated tritonserver_buildbase image contains all the dependencies needed to perform a full or incremental build. Within tritonserver_buildbase, /workspace/build/cmake_build contains the CMake invocations that are used to build the Triton core, the backends, and the repository agents.
To perform an incremental build within the tritonserver_buildbase
container, map your source into the container and then run the
appropriate CMake and make
(or msbuild.exe
) steps from cmake_build
within the container.
Assuming you have a clone of the server repo on your host system where you are making changes and you want to perform incremental builds to test those changes. Your source code is in /home/me/server. Run the tritonserver_buildbase container and map your server source directory into the container at /server.
$ docker run -it --rm -v/home/me/server:/server tritonserver_buildbase bash
Look at /workspace/build/cmake_build within the container for the section of commands that build "Triton core library". You can follow those command exactly, or you can modify them to change the build directory or the CMake options. You must change the CMake command to use /server instead of /workspace as the location for the CMakeLists.txt file and source:
$ cmake <options> /server
Then you can change directory into the build directory and run make
(or msbuild.exe
) as shown in cmake_build. As you make changes to the
source on your host system, you can perform incremental builds by
re-running make
(or msbuild.exe
).
Performing a full or incremental build of a backend or repository agent is similar to building the Triton core. As an example we will use the TensorRT backend. Assuming you have a clone of the TensorRT backend repo on your host system where you are making changes and you want to perform incremental builds to test those changes. Your source code is in /home/me/tritonserver_backend. Run the tritonserver_buildbase container and map your TensorRT backend source directory into the container at /tensorrt_backend. Note that some backends will use Docker as part of their build, and so the host's Docker registry must be made available within the tritonserver_buildbase by mounting docker.sock (on Windows use -v\.\pipe\docker_engine:\.\pipe\docker_engine).
$ docker run -it --rm -v/var/run/docker.sock:/var/run/docker.sock -v/home/me/tensorrt_backend:/tensorrt_backend tritonserver_buildbase bash
Look at /workspace/build/cmake_build within the container for the section of commands that build "TensorRT backend". You can follow those command exactly, or you can modify them to change the build directory or the CMake options. You must change the CMake command to use /tensorrt_backend instead of /workspace as the location for the CMakeLists.txt file and source:
$ cmake <options> /tensorrt_backend
Then you can change directory into the build directory and run make
(or msbuild.exe
) as shown in cmake_build. As you make changes to the
source on your host system, you can perform incremental builds by
re-running make
(or msbuild.exe
).
To build with Debug symbols, use the --build-type=Debug argument while launching build.py. If building directly with CMake use -DCMAKE_BUILD_TYPE=Debug. You can then launch the built server with gdb and see the debug symbols/information in the gdb trace.