Release TorchServe v0.11.0 Release Notes · pytorch/serve

This is the release of TorchServe v0.11.0.

Highlights Include

GenAI inference optimizations showcasing
- torch.compile with OpenVINO backend for Stable Diffusion
- Intel IPEX for Llama
Experimental support for Apple MPS and linux-aarch64
Security bug fixing

GenAI

Upgraded LLama2 examples to Llama3
- Supported Llama3 in HuggingFace Accelerate Example #3108 @mreso
- Supported Llama3 in chat bot #3131 @mreso
- Supported Llama3 on inf2 Neuronx transformer using continuous batching or micro batching #3133 #3035 @lxning
Examples for LoRA and Mistral #3077 @lxning
IPEX LLM serving example with Intel AMX #3068 @bbhattar
Integration of Intel Openvino with TorchServe using torch.compile. Example showcase of openvino torch.compile backend with Stable Diffusion #3116 @suryasidd
Enabling retrieval of guaranteed sequential order of input sequences with low latency for stateful inference via HTTP extending this previously gRPC-only feature #3142 @lxning

Linux aarch64 Support:

TorchServe adds support for linux-aarch64 and shows an example working on AWS Graviton. This provides users with a new platform alternative for serving models on CPU.

Supported linux aarch64 with examples SpeechT5 #3071 @agunapal

Apple Silicon Support:

TorchServe now includes support MPS backend on apple silicon #3048 @udaij12 @agunapal
Added TorchServe quickstart chatbot example #3003 @agunapal

XGBoost Support:

With the XGBoost Classifier example, we show how to deploy any pickled model with TorchServe.

Added XGBoost Classifier Example #3088 @agunapal

Security

The ability to bypass allowed_urls using relative paths has been fixed by ensuring preemptive check for relative paths prior to copying the model archive to the model store directory. Also, the default gRPC inference and management addresses are now set to localhost(127.0.0.1) to reduce scope of default access to gRPC endpoints.

Fixed allowed_urls filter bypass #3082 @udaij12 @msaroufim
Fixed GRPC address assignment to localhost by default #3083 @namannandan

C++ Backend

Supported pure cmake build #3021 @mreso

Documentation

Updated SECURITY.md #3038, #3041, #3043, #3046 #3084 @msaroufim @diogoteles08 @udaij12 @lxning @namannandan
Updated PT2 examples readme #3029 @chauhang
Updated Resnet18 torch.compile readme #3130 @SimonTong22
Updated doc-automation.yml #3105 @svekars

Improvements and Bug Fixing

Supported PyTorch 2.3 #3109 @agunapal
Applied Jsonify customized metadata on management API #3059 @harshita-meena
Accepted empty version in GRPC management API #3095 @harshita-meena
Added test template #3140 @mreso
Logged entire stdout and stderr for terminated backend worker process #3036 @namannandan
Increased test timeout for test_handler_traceback_logging #3113 @namannandan
Supported gRPC max connection age configuration #3121 @namannandan
Updated deprecated TorchVision and PyTorch APIs #3074 @kit1980 @agunapal
Supported Installation from source for a specific branch with docker #3055 @agunapal
Workaround for kserve nightly failure #3079 @agunapal
Disabled mac arm64 tests #3057 @agunapal
Fixed CI and Regression workflows for MAC Arm64 #3128 @namannandan
Included missing model configuration values in describe model API response #3122 @namannandan

Platform Support

Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.

GPU Support Matrix

TorchServe version	PyTorch version	Python	Stable CUDA	Experimental CUDA
0.11.0	2.3.0	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.10.0	2.2.1	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.9.0	2.1	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.8.0	2.0	>=3.8, <=3.11	CUDA 11.7, CUDNN 8.5.0.96	CUDA 11.8, CUDNN 8.7.0.84
0.7.0	1.13	>=3.7, <=3.10	CUDA 11.6, CUDNN 8.3.2.44	CUDA 11.7, CUDNN 8.5.0.96

Inferentia2 Support Matrix

TorchServe version	PyTorch version	Python	Neuron SDK
0.11.0	2.1	>=3.8, <=3.11	2.18.2+
0.10.0	1.13	>=3.8, <=3.11	2.16+
0.9.0	1.13	>=3.8, <=3.11	2.13.2+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchServe v0.11.0 Release Notes