Skip to content

TorchServe v0.8.1 Release Notes

Compare
Choose a tag to compare
@lxning lxning released this 14 Jun 23:54
· 450 commits to master since this release
c2cdcfb

This is the release of TorchServe v0.8.1.

New Features

  1. Supported microbatch in handler to parallel process a batch request from frontend. #2210 @mreso

Because pre- and post- processing are often carried out on the CPU the GPU sits idle until the two CPU bound steps are executed and the worker receives a new batch. Microbatch in handler is able to parallel process inference, pre- and post- processing for a batch request from frontend.

  1. Supported job ticket #2350 @lxning

This feature help with use cases where inference latency can be high, such as generative models, auto regressive decoder models like chatGPT. Applications can take effective actions, for example, routing the rejected request to a different server, or scaling up model server capacity, based on the business requirements.

  1. Supported job queue size configuration per model #2350 @lxning

New Examples

This example demonstrates creative content assisted by generative AI by using TorchServe on SageMaker MME.

Improvements

  • Upgraded to PyTorch 2.0.1 #2374 @namannandan

  • Significant reduction in Docker Image Size

    • Reduce GPU docker image size by 3GB #2392 @agunapal
    • Reduced dependency installation time and decrease docker image size #2364 @mreso
        GPU
        pytorch/torchserve   0.8.1-gpu   04eef250c14e   4 hours ago     2.34GB
        pytorch/torchserve   0.8.0-gpu   516bb13a3649   4 weeks ago     5.86GB
        pytorch/torchserve   0.6.0-gpu   fb6d4b85847d   12 months ago   2.13GB
      
        CPU
        pytorch/torchserve   0.8.1-cpu   68a3fcae81af   4 hours ago     662MB
        pytorch/torchserve   0.8.0-cpu   958ef6dacea2   4 weeks ago     2.37GB
        pytorch/torchserve   0.6.0-cpu   af91330a97bd   12 months ago   496MB
      
  • Updated CPU information for IPEX #2372 @min-jean-cho

  • Fixed inf2 example handler #2378 @namannandan

  • Added inf2 nightly benchmark #2283 @namannandan

  • Fixed archiver tgz format model directory structure mismatch on SageMaker #2405 @lxning

  • Fixed model archiver to fail if extra files are missing #2212 @mreso

  • Fixed device type setting in model config yaml #2408 @lxning

  • Fixed batchsize in config.properties not honored #2382 @lxning

  • Upgraded torchrun argument names and fixed backend tcp port connection #2377 @lxning

  • Fixed error thrown while loading multiple models in KServe #2235 @jagadeeshi2i

  • Fixed KServe fastapi migration issues #2175 @jagadeeshi2i

  • Added type annotation in model_server.py #2384 @josephcalise

  • Speed up unit test by removing sleep in start/stop torchserve #2383 @mreso

  • Removed cu118 from regression tests #2380 @agunapal

  • Enabled ONNX CI test #2363 @msaroufim

  • Removed session_mocker usage to prevent test cross talking #2375 @mreso

  • Enabled regression test in CI #2370 @msaroufim

  • Fixed regression test failures #2371 @namannandan

  • Bump up transformers version from 4.28.1 to 4.30.0 #2410

Documentation

Platform Support

Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.

GPU Support

Torch 2.0.1 + Cuda 11.7, 11.8
Torch 2.0.0 + Cuda 11.7, 11.8
Torch 1.13 + Cuda 11.7, 11.8
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2