Release TorchServe v0.8.1 Release Notes · pytorch/serve

This is the release of TorchServe v0.8.1.

New Features

Supported microbatch in handler to parallel process a batch request from frontend. #2210 @mreso

Because pre- and post- processing are often carried out on the CPU the GPU sits idle until the two CPU bound steps are executed and the worker receives a new batch. Microbatch in handler is able to parallel process inference, pre- and post- processing for a batch request from frontend.

Supported job ticket #2350 @lxning

This feature help with use cases where inference latency can be high, such as generative models, auto regressive decoder models like chatGPT. Applications can take effective actions, for example, routing the rejected request to a different server, or scaling up model server capacity, based on the business requirements.

Supported job queue size configuration per model #2350 @lxning

New Examples

Notebook example of TorchServe on SageMaker MME(multiple model endpoint). @lxning

This example demonstrates creative content assisted by generative AI by using TorchServe on SageMaker MME.

Improvements

Upgraded to PyTorch 2.0.1 #2374 @namannandan

Significant reduction in Docker Image Size

Reduce GPU docker image size by 3GB #2392 @agunapal

Reduced dependency installation time and decrease docker image size #2364 @mreso

  GPU
  pytorch/torchserve   0.8.1-gpu   04eef250c14e   4 hours ago     2.34GB
  pytorch/torchserve   0.8.0-gpu   516bb13a3649   4 weeks ago     5.86GB
  pytorch/torchserve   0.6.0-gpu   fb6d4b85847d   12 months ago   2.13GB

  CPU
  pytorch/torchserve   0.8.1-cpu   68a3fcae81af   4 hours ago     662MB
  pytorch/torchserve   0.8.0-cpu   958ef6dacea2   4 weeks ago     2.37GB
  pytorch/torchserve   0.6.0-cpu   af91330a97bd   12 months ago   496MB

Updated CPU information for IPEX #2372 @min-jean-cho
Fixed inf2 example handler #2378 @namannandan
Added inf2 nightly benchmark #2283 @namannandan
Fixed archiver tgz format model directory structure mismatch on SageMaker #2405 @lxning
Fixed model archiver to fail if extra files are missing #2212 @mreso
Fixed device type setting in model config yaml #2408 @lxning
Fixed batchsize in config.properties not honored #2382 @lxning
Upgraded torchrun argument names and fixed backend tcp port connection #2377 @lxning
Fixed error thrown while loading multiple models in KServe #2235 @jagadeeshi2i
Fixed KServe fastapi migration issues #2175 @jagadeeshi2i
Added type annotation in model_server.py #2384 @josephcalise
Speed up unit test by removing sleep in start/stop torchserve #2383 @mreso
Removed cu118 from regression tests #2380 @agunapal
Enabled ONNX CI test #2363 @msaroufim
Removed session_mocker usage to prevent test cross talking #2375 @mreso
Enabled regression test in CI #2370 @msaroufim
Fixed regression test failures #2371 @namannandan
Bump up transformers version from 4.28.1 to 4.30.0 #2410

Documentation

Fixed links in FAQ #2351 @sekyondaMeta
Fixed broken links in index.md #2329 @sekyondaMeta

Platform Support

Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.

GPU Support

Torch 2.0.1 + Cuda 11.7, 11.8
Torch 2.0.0 + Cuda 11.7, 11.8
Torch 1.13 + Cuda 11.7, 11.8
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchServe v0.8.1 Release Notes