Torch-TensorRT v1.4.0
PyTorch 2.0, CUDA 11.8, TensorRT 8.6, Support for the new torch.compile
API, compatibility mode for FX frontend
Torch-TensorRT 1.4.0 targets PyTorch 2.0, CUDA 11.8, TensorRT 8.5. This release introduces a number of beta features to set the stage for working with PyTorch and TensorRT in the 2.0 ecosystem. Primarily, this includes a new torch.compile
backend targeting Torch-TensorRT. It also adds a compatibility layer that allows users of the TorchScript frontend for Torch-TensorRT to seamlessly try FX and Dynamo.
torch.compile` Backend for Torch-TensorRT
One of the most prominent new features in PyTorch 2.0 is the torch.compile
workflow, which enables users to accelerate code easily by specifying a backend of their choice. Torch-TensorRT 1.4.0 introduces a new backend for torch.compile
as a beta feature, including a convenience frontend to perform accelerated inference. This frontend can be accessed in one of two ways:
import torch_tensorrt
torch_tensorrt.dynamo.compile(model, inputs, ...)
##### OR #####
torch_tensorrt.compile(model, ir="dynamo_compile", inputs=inputs, ...)
For more examples, see the provided sample scripts, which can be found here
This compilation method has a couple key considerations:
- It can handle models with data-dependent control flow
- It automatically falls back to Torch if the TRT Engine Build fails for any reason
- It uses the Torch FX
aten
library of converters to accelerate models - Recompilation can be caused by changing the batch size of the input, or providing an input which enters a new control flow branch
- Compiled models cannot be saved across Python sessions (yet)
The feature is currently in beta, and we expect updates, changes, and improvements to the above in the future.
fx_ts_compat
Frontend
As the ecosystem transitions from TorchScript to Dynamo, users of Torch-TensorRT may want start to experiment with this stack. As such we have introduced a new frontend for Torch-TensorRT which exposes the same APIs as the TorchScript frontend but will use the FX/Dynamo compiler stack. You can try this frontend by using the ir="fx_ts_compat"
setting
torch_tensorrt.compile(..., ir="fx_ts_compat")
What's Changed
- Fix build by @yinghai in #1479
- add circle CI signal in README page by @yinghai in #1481
- fix eisum signature by @yinghai in #1480
- Fix link to CircleCI in README.md by @yinghai in #1483
- Minor changes by @yinghai in #1482
- [FX] Changes done internally at Facebook by @frank-wei in #1456
- chore: upload docs for 1.3.0 by @narendasan in #1504
- fix: Repair Citrinet-1024 compilation issues by @gs-olive in #1488
- refactor: Split elementwise tests by @peri044 in #1507
- [feat] Support 1D topk by @mfeliz-cruise in #1491
- Support aten::sum with bool tensor input by @mfeliz-cruise in #1512
- [fix]Disambiguate cast layer names by @mfeliz-cruise in #1513
- feat: Add functionality for easily benchmarking fx code on key models by @gs-olive in #1506
- [feat]Canonicalize aten::multiply to aten::mul by @mfeliz-cruise in #1517
- broadcast the two input shapes for transposed matmul by @nvpohanh in #1457
- make padding layer converter more efficient by @nvpohanh in #1470
- fix: Change equals-check from reference to value for BERT model not compiling in FX by @gs-olive in #1539
- Update README dependencies section for v1.3.0 by @take-cheeze in #1540
- fix:
aten::where
with differing-shape inputs bugfix by @gs-olive in #1533 - fix: Automatically send truncated long ints to cuda at shape analysis time by @gs-olive in #1541
- feat: Add functionality to FX benchmarking + Improve documentation by @gs-olive in #1529
- [fix] Fix crash when calling unbind on evaluated tensor by @mfeliz-cruise in #1554
- Update test_flatten_aten and test_reshape_aten due to PT2.0 changed tracer behavior for these ops by @frank-wei in #1559
- fix: Bugfix for
align_corners=False
- FX interpolate by @gs-olive in #1561 - fix: Properly cast intermediate Int8 tensors to TensorRT Engines in Fallback by @gs-olive in #1549
- Upgrade stack to Pytorch 2.0 + CUDA 11.7 + TRT 8.5 GA by @peri044 in #1477
- feat: Add option to specify int64 as an Input dtype by @gs-olive in #1551
- feat: Support int inputs to aten::max/min and aten::argmax/argmin by @mfeliz-cruise in #1574
- fix: Add
aten::full_like
evaluator by @gs-olive in #1584 - tools: assign 1 person to a bug instead of all by @narendasan in #1604
- feat: Add support for aten::meshgrid by @mfeliz-cruise in #1601
- [FX] Changes done internally at Facebook by @frank-wei in #1603
- chore: Add FX core test by @peri044 in #1593
- chore: Update dockerfile by @peri044 in #1581
- fix: Replace
RemoveDropout
lowering pass implementation with modified JIT pass by @gs-olive in #1589 - [FX] Changes done internally at Facebook by @frank-wei in #1625
- chore: Update Dockerfile to Ubuntu 20.04 + Crash Resolution by @gs-olive in #1639
- fix: Bugfix in Linear-to-AddMM Fusion Lowering Pass by @gs-olive in #1619
- fix: Resolve compilation bug for empty tensors in
aten::select
by @gs-olive in #1623 - Convolution cast by @apbose in #1609
- fix: Bugfix in TRT Engine deserialization indexing by @gs-olive in #1646
- fix: fix the inappropriate lowering pass of aten::to by @bowang007 in #1649
- Lowering aten::pad to aten::constant_pad_nd/aten::reflection_padXd/aten::replication_padXd by @ruoqianguo in #1588
- [fix] Disambiguate element-wise cast layer names by @mfeliz-cruise in #1630
- feat: Add optional tensor domain argument to Input class by @gs-olive in #1537
- Improve batch_norm fp16 accuracy by @mfeliz-cruise in #1450
- add an example of aten2trt, fix batch norm pass by @frank-wei in #1685
- fix: Issue in non-Tensor Input Resolution by @gs-olive in #1617
- Corrected a typo, which was raising an error by @zshn25 in #1694
- Cherry-pick manylinux compatible builds into main by @narendasan in #1677
- fix: Improve input handling for
input_signature
by @gs-olive in #1698 - Unsqueeze operator with dynamic inout by @apbose in #1624
- [feat] Add converter support for index_select by @mfeliz-cruise in #1692
- [feat] Add converter support for aten::logical_not by @mfeliz-cruise in #1705
- fix: Bugfix in convNd_to_convolution lowering pass by @gs-olive in #1693
- [feat] Add converter for aten::any.dim by @mfeliz-cruise in #1707
- [fix] resolve issue for single non-batch index tensor in aten::index by @mfeliz-cruise in #1700
- fix: Handle nonetype pad value for Constant pad by @peri044 in #1712
- infra: Add Torch 1.13.1 testing to nightly CI by @gs-olive in #1731
- fix: Allow full model compilation with collection outputs by @gs-olive in #1599
- fix: fix the prim::Loop fallback issue by @bowang007 in #1691
- feat: Add decorator utility to improve error messaging for legacy support by @gs-olive in #1738
- minor fix: Update default minimum torch version for aten tracer by @gs-olive in #1747
- Get windows build working by @bharrisau in #1711
- Update config.yml by @frank-wei in #1736
- fix: Bugfix in shape analysis for multi-GPU systems by @gs-olive in #1765
- fix: Add schemas to convolution lowering pass by @gs-olive in #1728
- fix: Update Docker build to automatically adapt Torch version by @gs-olive in #1732
- feat: Upgrade Pytorch and TensorRT versions by @peri044 in #1759
- feat: Merge dynamo additions into
release/1.4
by @gs-olive in #1884 - fix: Cherry-pick
acc
convolution fix torelease/1.4
by @gs-olive in #1910 - cherry-pick: Reorganize + Upgrade Dynamo (
release/1.4
) by @gs-olive in #1931 - fix: Upgrade
release/1.4
to Torch 2.0.1 + TensorRT 8.6.1 by @gs-olive in #1896 - cherry-pick: Dynamo upgrades and bugfixes (
release/1.4
) by @gs-olive in #1956
New Contributors
- @nvpohanh made their first contribution in #1457
- @take-cheeze made their first contribution in #1540
- @zshn25 made their first contribution in #1694
- @bharrisau made their first contribution in #1711
Full Changelog: v1.3.0...v1.4.0