Torch-TensorRT v2.2.0
Dynamo Frontend for Torch-TensorRT, PyTorch 2.2, CUDA 12.1, TensorRT 8.6
Torch-TensorRT 2.2.0 targets PyTorch 2.2, CUDA 12.1 (builds for CUDA 11.8 are available via the PyTorch package index - https://download.pytorch.org/whl/cu118) and TensorRT 8.6. This release is the second major release of Torch-TensorRT as the default frontend has changed from TorchScript to Dynamo allowing for users to more easily control and customize the compiler in Python.
The dynamo frontend can support both JIT workflows through torch.compile
and AOT workflows through torch.export + torch_tensorrt.compile
. It targets the Core ATen Opset (https://pytorch.org/docs/stable/torch.compiler_ir.html#core-aten-ir) and currently has 82% coverage. Just like in Torchscript graphs will be partitioned based on the ability to map operators to TensorRT in addition to any graph surgery done in Dynamo.
Output Format
Through the Dynamo frontend, different output formats can be selected for AOT workflows via the output_format
kwarg. The choices are torchscript
where the resulting compiled module will be traced with torch.jit.trace
, suitable for Pythonless deployments, exported_program
a new serializable format for PyTorch models or finally if you would like to run further graph transformations on the resultant model, graph_module
will return a torch.fx.GraphModule
.
Multi-GPU Safety
To address a long standing source of overhead, single GPU systems will now operate without typical required device checks. This check can be re-added when multiple GPUs are available to the host process using torch_tensorrt.runtime.set_multi_device_safe_mode
# Enables Multi Device Safe Mode
torch_tensorrt.runtime.set_multi_device_safe_mode(True)
# Disables Multi Device Safe Mode [Default Behavior]
torch_tensorrt.runtime.set_multi_device_safe_mode(False)
# Enables Multi Device Safe Mode, then resets the safe mode to its prior setting
with torch_tensorrt.runtime.set_multi_device_safe_mode(True):
...
More information can be found here: https://pytorch.org/TensorRT/user_guide/runtime.html
Capability Validators
In the Dynamo frontend, tests can be written and associated with converters to dynamically enable or disable them based on conditions in the target graph.
For example, the convolution converter in dynamo only supports 1D, 2D, and 3D convolution. We can therefore create a lambda which given a convolution FX node can determine if the convolution is supported:
@dynamo_tensorrt_converter(
torch.ops.aten.convolution.default,
capability_validator=lambda conv_node: conv_node.args[7] in ([0], [0, 0], [0, 0, 0])
) # type: ignore[misc]
def aten_ops_convolution(
ctx: ConversionContext,
target: Target,
args: Tuple[Argument, ...],
kwargs: Dict[str, Argument],
name: str,
) -> Union[TRTTensor, Sequence[TRTTensor]]:
In such a case where the Node
is not supported, the node will be partitioned out and run in PyTorch.
All capability validators are run prior to partitioning, after the lowering phase.
More information on writing converters for the Dynamo frontend can be found here: https://pytorch.org/TensorRT/contributors/dynamo_converters.html
Breaking Changes
- Dynamo (torch.export) is now the default frontend for Torch-TensorRT. The TorchScript and FX frontends are now in maintenance mode. Therefore any
torch.nn.Module
s ortorch.fx.GraphModule
s provided totorch_tensorrt.compile
will by default be exported usingtorch.export
then compiled. This default can be overridden by setting their=[torchscript|fx]
kwarg. Any bugs reported will first be attempted to be resolved in the dynamo stack before attempting other frontends however pull requests for additional functionally in the TorchScript and FX frontends from the community will still be accepted.
What's Changed
- chore: Update Torch and Torch-TRT versions and docs on
main
by @gs-olive in #1784 - fix: Repair invalid schema arising from lowering pass by @gs-olive in #1786
- fix: Allow full model compilation with collection inputs (
input_signature
) by @gs-olive in #1656 - feat(//core/conversion): Add support for aten::size with dynamic shaped models for Torchscript backend. by @peri044 in #1647
- feat: add support for aten::baddbmm by @mfeliz-cruise in #1806
- [feat] Add dynamic conversion path to aten::mul evaluator by @mfeliz-cruise in #1710
- [fix] aten::stack with dynamic inputs by @mfeliz-cruise in #1804
- fix undefined attr issue by @bowang007 in #1783
- fix: Out-Of-Bounds bug in Unsqueeze by @gs-olive in #1820
- feat: Upgrade Docker build to use custom TRT + CUDNN by @gs-olive in #1805
- fix: include str ivalue type conversion by @bowang007 in #1785
- fix: dependency order of inserted long input casts by @mfeliz-cruise in #1833
- feat: Add ts converter support for aten::all.dim by @mfeliz-cruise in #1840
- fix: Error caused by invalid binding name in
TRTEngine.to_str()
method by @gs-olive in #1846 - fix: Implement
aten.mean.default
andaten.mean.dim
converters by @gs-olive in #1810 - feat: Add converter for aten::log2 by @mfeliz-cruise in #1866
- feat: Add support for aten::where with scalar other by @mfeliz-cruise in #1855
- feat: Add converter support for logical_and by @mfeliz-cruise in #1856
- feat: Refactor FX APIs under dynamo namespace for parity with TS APIs by @peri044 in #1807
- fix: Add version checking for
torch._dynamo
import in__init__
by @gs-olive in #1881 - fix: Improve Docker build robustness, add validation by @gs-olive in #1873
- fix: Improve input weight handling to
acc_ops
convolution layers in FX by @gs-olive in #1886 - fix: Upgrade
main
to TRT 8.6, CUDA 11.8, CuDNN 8.8, Torch Dev by @gs-olive in #1852 - feat: Wrap dynamic size handling in a compilation flag by @peri044 in #1851
- fix: Add torchvision legacy CI parameter by @gs-olive in #1918
- Sync fb internal change to OSS by @wushirong in #1892
- fix: Reorganize Dynamo directory + backends by @gs-olive in #1928
- fix: Improve partitioning + lowering systems in
torch.compile
path by @gs-olive in #1879 - fix: Upgrade TRT to 8.6.1, parallelize FX tests in CI by @gs-olive in #1930
- feat: Add issue template for Story by @gs-olive in #1936
- feat: support type promotion in aten::cat converter by @mfeliz-cruise in #1911
- Reorg for converters in (FX Converter Refactor [1/N]) by @narendasan in #1867
- fix: Add support for default dimension in
aten.cat
by @gs-olive in #1863 - Relaxing glob pattern for CUDA12 by @borisfom in #1950
- refactor: Centralizing sigmoid implementation (FX Converter Refactor [2/N]) <Target: converter_reorg_proto> by @narendasan in #1868
- fix: Address
.numpy()
issue on fake tensors by @gs-olive in #1949 - feat: Add support for passing through build issues in Dynamo compile by @gs-olive in #1952
- fix: int/int=float division by @mfeliz-cruise in #1957
- fix: Support dims < -1 in aten::stack converter by @mfeliz-cruise in #1947
- fix: Resolve issue in isInputDynamic with mixed static/dynamic shapes by @mfeliz-cruise in #1883
- DLFW changes by @apbose in #1878
- feat: Add converter for aten::isfinite by @mfeliz-cruise in #1841
- Reorg for converters in hardtanh(FX Converter Refactor [5/N]) <Target: converter_reorg_proto> by @apbose in #1901
- fix/feat: Add lowering pass to resolve most
aten::Int.Tensor
uses by @gs-olive in #1937 - fix: Add decomposition for
aten.addmm
by @gs-olive in #1953 - Reorg for converters tanh (FX Converter Refactor [4/N]) <Target: converter_reorg_proto> by @apbose in #1900
- Reorg for converters leaky_relu (FX Converter Refactor [6/N]) <Target: converter_reorg_proto> by @apbose in #1902
- Upstream 3 features to fx_ts_compat: MS, VC, Optimization Level by @wu6u3tw in #1935
- fix: Add lowering pass to remove output repacking in
convert_method_to_trt_engine
calls by @gs-olive in #1945 - Fixing aten::slice invalid schema and implementing aten::list evaluator by @apbose in #1695
- fix: Rewrite constant_pad_nd to use a single slice layer for performance by @mfeliz-cruise in #1970
- Adding converter aten::chunk in torchscript by @apbose in #1802
- fix: Repair index used to access tensor bindings by @gs-olive in #1998
- Reorg for converters elu and selu (FX Converter Refactor [7/N]) <Target: converter_reorg_proto> by @apbose in #1903
- chore(deps): bump transformers from 4.17.0 to 4.30.0 in /tests/modules by @dependabot in #2013
- fix: Repair input range on BERT inputs for CI by @gs-olive in #2017
- fix: Refactor assertions in E2E tests for Dynamo by @gs-olive in #2001
- chore/fix: Update
TRTInterpreter
impl in Dynamo compile [1 / x] by @gs-olive in #2002 - fix: Repair flaky TopK core test by @gs-olive in #2022
- feat: Add
options
kwargs for Torch compile [3 / x] by @gs-olive in #2005 - feat: Add support for output data types in
TRTInterpreter
[2 / x] by @gs-olive in #2004 - chore: Upgrade Torch nightly to
2.1.0.dev20230605
[4 / x] by @gs-olive in #1975 - fix: Repair output binding indexing scheme in TRT by @gs-olive in #2054
- fix: Improve logging and kwarg passing in Dynamo by @gs-olive in #2052
- fix: Add support for fake tensors by @gs-olive in #1955
- fix: Repair argument passing in both Dynamo paths by @gs-olive in #1997
- minor fix: Dynamo CI fix due to merge issue by @gs-olive in #2067
- feat: Module-Acceleration in Dynamo [5 / x] by @gs-olive in #1979
- fix/feat: Move convolution core to
impl
+ add feature (FX converter refactor) by @gs-olive in #1972 - chore: Upgrade to CUDA 12.1 by @gs-olive in #2020
- fix: Repair null bindings issue in TRT Engines by @gs-olive in #2080
- fix: Add python3 symlink in final container by @gs-olive in #2085
- feat: Add support for
TorchTensorRTModule
in Dynamo [1 / x] by @gs-olive in #2003 - fix: Repair import error for legacy TS testing by @gs-olive in #2091
- chore: Update Torch to Jul 3 Nightly by @gs-olive in #2099
- fix: Repair graph naming for FX legacy suite by @gs-olive in #2111
- DLFW changes by @apbose in #2109
- fix: Update CI GPU Class by @gs-olive in #2116
- fix: Replace EliminateExceptions lowering pass by @gs-olive in #1859
- chore: Improve error propagation for torch compile by @gs-olive in #2106
- fix: Repair version checking system for Torch by @gs-olive in #2118
- feat: Dynamo refactor by @peri044 in #2104
- feat: Set default ir to dynamo export by @peri044 in #2029
- fix: TRTInterpreter output lacks return value by @gs-olive in #2114
- fix/feat: Add Dynamo-only converter registry by @gs-olive in #1944
- fix: Add support for
truncate_long_and_double
in Dynamo [8 / x] by @gs-olive in #1983 - docs: Update readme to include TRT as a seperate install dep. by @narendasan in #2137
- fix: Move all
aten
PRs to Dynamo converter registry by @gs-olive in #2070 - Change python build system to be PEP517 compatible by @narendasan in #2056
- chore: fix the docgen job by @narendasan in #2158
- feat: Implement dynamic shape support for floordiv, NumToTensor, layer_norm by @peri044 in #2006
- examples: Add example usage scripts for
torch_tensorrt.dynamo.compile
path [1.1 / x] by @gs-olive in #1966 - [feat] TS: Add support for dynamic select and masked_fill by @mfeliz-cruise in #2115
- feat: Added support for aten::unflatten converter by @andi4191 in #2097
- feat: Added a variant for aten::fake_quant_per_tensor by @andi4191 in #2107
- ci: Add automatic GHA job to build + push Docker Container on
main
by @gs-olive in #2129 - chore: Add
pyyaml
import to GHA Docker job by @gs-olive in #2170 - feat(torch_tensorrt.dynamo.tools): Tool to calculate coverage of PyTorch by @narendasan in #2166
- chore: Add parallelism to Dynamo tests by @gs-olive in #2165
- feat: Add support for dynamic zeros_like and ones_like by @mfeliz-cruise in #1847
- feat: Added support for aten::tile converter by @andi4191 in #2105
- Improve Python tooling by @narendasan in #2126
- Py38 compatibility by @narendasan in #2189
- abandoned create_plugin() function by @zewenli98 in #2146
- feat: Improve Dynamo partitioning System Performance on Large Models by @gs-olive in #2175
- feat: Improve Logging in Dynamo by @peri044 in #2194
- feat: Add ExportedProgram as an IR by @peri044 in #2191
- feat: Improve layer naming by @peri044 in #2162
- fix: Update
aten.embedding
to reflect schema by @gs-olive in #2182 - feat: Add
_to_copy
,operator.get
andclone
ATen converters by @gs-olive in #2161 - fix: Repair broadcasting utility for
aten.where
by @gs-olive in #2228 - chore: Fix Logging in torch_compile path by @peri044 in #2238
- feat: Add Selective ATen decompositions by @gs-olive in #2173
- Type mismatch for dynamo aten::where converter by @apbose in #2198
- fix: Set
dynamic=False
intorch.compile
call by @gs-olive in #2240 - fix: Allow rank differences in
aten.expand
by @gs-olive in #2234 - fix: Address runtimes with 0D inputs by @gs-olive in #2188
- feat: support many unary dynamo converters by @zewenli98 in #2246
- fix: Decrease Docker container size by 20% by @gs-olive in #2257
- feat: Add support for device compilation setting by @gs-olive in #2190
- fix: Legacy CI
pip
installation by @gs-olive in #2239 - feat: support amax dynamo converter by @zewenli98 in #2241
- feat: Exempt default softmax from decomposition by @gs-olive in #2268
- fix: Reorganize Dynamo testing directories by @gs-olive in #2255
- feat: Add support for
require_full_compilation
in Dynamo by @gs-olive in #2138 - fix: Unify layers in Docker Container Cleanup by @gs-olive in #2275
- infra: testing out GHA CI by @narendasan in #2073
- feat: support conv dynamo converter by @zewenli98 in #2252
- Enabling var_mean decomposition by @apbose in #2273
- fix: add an arg in matmul by @zewenli98 in #2279
- feat: support activation dynamo converters by @zewenli98 in #2254
- feat: support torch.ops.aten.sum.(default and dim_IntList) dynamo converter by @zewenli98 in #2278
- tools(opset_coverage): Map default ops to unoverloaded ops by @narendasan in #2292
- add initial support for torch.ops.aten.neg.default converter by @bowang007 in #2147
- fix: Torch Upgrade to 2.2.0.dev by @gs-olive in #2298
- chore: enabling TS FE testing by @narendasan in #2283
- Update _Input.py by @phyboy in #2293
- feat: support many elementwise dynamo converters by @zewenli98 in #2263
- feat: support linear (fully connected layer) dynamo converter by @zewenli98 in #2253
- WAR: Disabling ViT tests until exporting with py311 is fixed by @narendasan in #2305
- neg converter correction by @apbose in #2307
- feat: Add preliminary support for freezing tensors in Dynamo by @gs-olive in #2128
- fix: Wrap import of ConstantFold utilities by @gs-olive in #2312
- fix: Move aten.neg test case by @gs-olive in #2310
- small fix: Packaging version switch by @gs-olive in #2315
- fix: Register tensorrt backend name by @gs-olive in #2311
- feat: Transition export workflows to use torch._export APIs by @peri044 in #2195
- fix: Add special cases for
clone
andto_copy
where input of graph is output by @gs-olive in #2265 - fix: Raise error when registering Packet-keyed converter by @gs-olive in #2285
- FX converter documentation by @apbose in #2039
- aten::split converter by @apbose in #2232
- DLFW changes by @apbose in #2281
- feat: Add ATen lowering pass system by @gs-olive in #2280
- fix: Support non -1 end idx and <0 start idx in aten::flatten converter by @mfeliz-cruise in #2321
- Dynamo converter support for torch.ops.aten.erf.default op by @bowang007 in #2164
- fix: Update Torchvision version to address dependency resolution issue by @gs-olive in #2339
- fix: Remove input aliasing of builtin ops by @gs-olive in #2276
- fix: Allow low rank inputs in Python Runtime by @gs-olive in #2282
- fix: Address multi-GPU issue in engine deserialize by @gs-olive in #2325
- feat: support deconv (1d, 2d, and Nd) dynamo converter by @zewenli98 in #2337
- Update usage of PyTorch's custom op API by @zou3519 in #2193
- feat: support bmm converter in dynamo by @bowang007 in #2248
- feat: support 1D, 2D, and 3D avg and max pooling dynamo converters by @zewenli98 in #2317
- fix: Add support for negative dimensions in reduce by @gs-olive in #2347
- feat: Add tensor type enforcement for converters by @gs-olive in #2324
- fix: Issue in TS dimension-squeeze utility by @gs-olive in #2336
- perf: Add lowering passes to improve TRT runtime on SD by @gs-olive in #2351
- feat: Implement Dynamic shapes + fallback support for export path by @peri044 in #2271
- feat: Add maxpool lowering passes and experimental folder in Dynamo by @gs-olive in #2358
- Aten::Index converter by @apbose in #2277
- feat: Implement support for exporting Torch-TensorRT compiled graphs using torch.export serde APIs by @peri044 in #2249
- chore: Switch converter tests to generate standalone ops using fx.symbolic_trace by @peri044 in #2361
- fix/feat: Add and repair multiple converters for SD + other models by @gs-olive in #2353
- feat: support flatten and reshape via shuffle_layer by @zewenli98 in #2354
- feat: support prod, max, min, and mean via reduce layer by @zewenli98 in #2355
- minor fix: Update
get_ir
prefixes by @gs-olive in #2369 - Dynamo converter cat by @apbose in #2343
- fix: Repair issue in Torch Constant Folder by @gs-olive in #2375
- fix: Repair
aten.where
with Numpy + Broadcast by @gs-olive in #2372 - Cherry-pick changes from main into release/2.1 by @narendasan in #2302
- cherry-pick: Key converters and documentation to
release/2.1
by @gs-olive in #2387 - cherry-pick: Decompostion fix and documentation updates by @gs-olive in #2391
- feat: Wrap ExportedPrograms transformations with an API, allow dynamo.compile to accept graphmodules. by @peri044 in #2388
- Cherry-pick : Add documentation for dynamo.compile backend (#2389) by @peri044 in #2416
- cherry-pick: Transformer XL fix to
release/2.1
by @gs-olive in #2414 - cherry-pick/fix: Performance benchmarking fixes and Torch version fix by @gs-olive in #2433
- Cherry pick 2420 to release/2.1 by @peri044 in #2425
- cherry-pick/minor fix: Parse out slashes in Docker container name (#2437) by @gs-olive in #2438
- chore: fix docs for export [release/2.1] by @peri044 in #2448
- chore: add additional native BN converter (cherry-pick of #2446) by @peri044 in #2452
- cherry-pick/fix: Docs rendering on PyTorch site (#2440) by @gs-olive in #2441
- minor fix: Update Benchmark values (#2453) by @gs-olive in #2454
- cherry-pick: Wrap perf benchmarks with no_grad (#2466) by @gs-olive in #2470
- chore: Upgrade
release
to Torch 2.1.1 by @gs-olive in #2472 - fix: Naming issue in opset coverage tool by @gs-olive in #2477
- cherry-pick: View and slice bugfixes by @gs-olive in #2500
- cherry-pick: Perf + Bugfix PRs by @gs-olive in #2513
- fix:
release/2.1
CI Repair by @gs-olive in #2528 - cherry-pick: Safe mode and Build Arguments PRs by @gs-olive in #2521
- cherry-pick: Port most changes from
main
by @gs-olive in #2574 - chore: clean up AWS credentials PR changes by @peri044 in #2608
- chore: Set return type of compilation to ExportedProgram [release/2.2] by @peri044 in #2607
- cherry-pick: Docker fixes
release/2.2
by @gs-olive in #2628 - fix: Upgrade versions for Docker build rel 2.2 by @gs-olive in #2630
- cherry-pick: Remove keyserver fetch from Dockerfile (#2639) by @gs-olive in #2640
- cherry-pick: Remove extraneous argument in
compile
(#2635) by @gs-olive in #2638 - cherry-pick: Attention converter and linting fixes by @gs-olive in #2641
- small fix: Index validator enable int64 (#2642) by @gs-olive in #2643
New Contributors
- @wushirong made their first contribution in #1892
- @wu6u3tw made their first contribution in #1935
- @phyboy made their first contribution in #2293
- @zou3519 made their first contribution in #2193
Full Changelog: v1.4.0...v2.2.0