TorchServe v0.11.0 Release Notes
This is the release of TorchServe v0.11.0.
Highlights Include
- GenAI inference optimizations showcasing
torch.compile
with OpenVINO backend for Stable Diffusion- Intel IPEX for Llama
- Experimental support for Apple MPS and linux-aarch64
- Security bug fixing
GenAI
- Upgraded LLama2 examples to Llama3
- Examples for LoRA and Mistral #3077 @lxning
- IPEX LLM serving example with Intel AMX #3068 @bbhattar
- Integration of Intel Openvino with TorchServe using
torch.compile
. Example showcase ofopenvino
torch.compile
backend with Stable Diffusion #3116 @suryasidd - Enabling retrieval of guaranteed sequential order of input sequences with low latency for stateful inference via HTTP extending this previously gRPC-only feature #3142 @lxning
Linux aarch64 Support:
TorchServe adds support for linux-aarch64 and shows an example working on AWS Graviton. This provides users with a new platform alternative for serving models on CPU.
Apple Silicon Support:
- TorchServe now includes support MPS backend on apple silicon #3048 @udaij12 @agunapal
- Added TorchServe quickstart chatbot example #3003 @agunapal
XGBoost Support:
With the XGBoost Classifier example, we show how to deploy any pickled model with TorchServe.
Security
The ability to bypass allowed_urls using relative paths has been fixed by ensuring preemptive check for relative paths prior to copying the model archive to the model store directory. Also, the default gRPC inference and management addresses are now set to localhost(127.0.0.1) to reduce scope of default access to gRPC endpoints.
- Fixed allowed_urls filter bypass #3082 @udaij12 @msaroufim
- Fixed GRPC address assignment to localhost by default #3083 @namannandan
C++ Backend
Documentation
- Updated SECURITY.md #3038, #3041, #3043, #3046 #3084 @msaroufim @diogoteles08 @udaij12 @lxning @namannandan
- Updated PT2 examples readme #3029 @chauhang
- Updated Resnet18 torch.compile readme #3130 @SimonTong22
- Updated doc-automation.yml #3105 @svekars
Improvements and Bug Fixing
- Supported PyTorch 2.3 #3109 @agunapal
- Applied Jsonify customized metadata on management API #3059 @harshita-meena
- Accepted empty version in GRPC management API #3095 @harshita-meena
- Added test template #3140 @mreso
- Logged entire stdout and stderr for terminated backend worker process #3036 @namannandan
- Increased test timeout for test_handler_traceback_logging #3113 @namannandan
- Supported gRPC max connection age configuration #3121 @namannandan
- Updated deprecated TorchVision and PyTorch APIs #3074 @kit1980 @agunapal
- Supported Installation from source for a specific branch with docker #3055 @agunapal
- Workaround for kserve nightly failure #3079 @agunapal
- Disabled mac arm64 tests #3057 @agunapal
- Fixed CI and Regression workflows for MAC Arm64 #3128 @namannandan
- Included missing model configuration values in describe model API response #3122 @namannandan
Platform Support
Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.
GPU Support Matrix
TorchServe version | PyTorch version | Python | Stable CUDA | Experimental CUDA |
---|---|---|---|---|
0.11.0 | 2.3.0 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.10.0 | 2.2.1 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.9.0 | 2.1 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.8.0 | 2.0 | >=3.8, <=3.11 | CUDA 11.7, CUDNN 8.5.0.96 | CUDA 11.8, CUDNN 8.7.0.84 |
0.7.0 | 1.13 | >=3.7, <=3.10 | CUDA 11.6, CUDNN 8.3.2.44 | CUDA 11.7, CUDNN 8.5.0.96 |
Inferentia2 Support Matrix
TorchServe version | PyTorch version | Python | Neuron SDK |
---|---|---|---|
0.11.0 | 2.1 | >=3.8, <=3.11 | 2.18.2+ |
0.10.0 | 1.13 | >=3.8, <=3.11 | 2.16+ |
0.9.0 | 1.13 | >=3.8, <=3.11 | 2.13.2+ |