Releases: AmusementClub/vs-mlrt
v8: latest CUDA libraries and ~10% faster
- This release upgrades the cuda libraries to their latest version. Models are observed to be accelerated by ~1.1x.
vsmlrt.CUGAN()
now accepts a new parameteralpha
, which controls the strength of filtering. Settingalpha
to non-default values requires the Pythononnx
package (but this might change in the future.)- Added
tf32
parameter to the trt backend in vsmlrt.py. TF32 acceleration is enabled by default on the Ampere GPUs, mostly for fp32 inference, and it has no effect on other architectures.
v7: add Real-CUGAN
This release adds support for bilibili's Real-CUGAN, please refer to the wiki for details.
Special notes for CUGAN:
- Make sure the RGBS input to CUGAN is within
[0,1]
range (if in doubt, better to usecore.std.Expr(input, "x 0 max 1 min")
to condition the input before feeding the NN; fmtconv YUV2RGB might generate out of range RGB values): Out of range values will trip the NN into producing large negative values. - Do not use tiling (i.e. must set
tiles=1
) as CUGAN requires access to the entire input frame for its depth detection mechanism to work.
Compared to v6, only scripts.v7.7z
, models.v7.7z
, vsmlrt-windows-x64-cpu.v7.7z
and vsmlrt-windows-x64-cuda.v7.7z
files are updated.
v6 further performance optimizations of vs-trt and vs-ov&vs-ort bugfix
This release contains some performance optimization of the vs-trt plugin. The general takeaway is that vs-trt can beat all benchmarked solutions on DPIR, waifu2x and RealESRGANv2 models. Specific highlights are as follows:
- waifu2x: when using CPU, vs-ov beats waifu2x-w2xc by 2.7x (Intel 32C64T); when using GPU, vs-ort/vs-trt beats vulkan-ncnn by ~4x.
- DPIR: vs-trt beats existing implementations on both Volta (Tesla V100) and Ampere (A10) platforms (by at most 1.5x), and vs-ort saves significant amount of GPU memory (by as much as 3.7x) compared to its counterpart
- RealESRGANv2: vs-trt, being the only backend that utilizes TensorRT, is up to 3.3x faster than the reference implementation
Please see detailed benchmark results in the wiki:
- waifu2x: https://github.com/AmusementClub/vs-mlrt/wiki/waifu2x#benchmarking
- DPIR: https://github.com/AmusementClub/vs-mlrt/wiki/DPIR#benchmarking
- RealESRGANv2: https://github.com/AmusementClub/vs-mlrt/wiki/RealESRGANv2#benchmarking
This release also fixed the following two bugs:
- vs-ov: some openvino error messages from openvino were sent to stdout, affecting
vspipe | x265
usage. - vs-ort/vs-ov: error in converting RealESRGANv2 model to fp16 format.
v5 fp16 support & production ready!
Changelog:
- added fp16 support to vs-ov and vs-ort (input model is still fp32, and these filters will convert it to fp16 on the fly). Now all three backends support inference with fp16 (though using fp16 mainly benefit vs-ort's CUDA backend).
fixed vs-ov spurious logging messages to stdout which interferes withTurns out the fix is not picked by the CI. Please use v6 for vs-ov.vspipe | x265
pipeline (requires patched openvino)- changes to the vs-trt backend
vsmlrt.Backend.TRT()
of thevsmlrt.py
wrappermax_shapes
defaults to tile size now (as tensorrt GPU memory usage is related tomax_shapes
rather than the actual shape used in inference, this should help saving GPU memory);- the default
opt_shapes
isNone
now, which means it will be set to the actualtilesize
in use: this is especially beneficial for large models like DPIR. If you prefer faster engine build times, you should setopt_shapes=(64, 64)
to restore previous behavior. This change also makes it easier to use thetiles
parameter (as in this case, you generally don't know the exact inference shape) - changed default cache & engine directory: first try saving the engine and cache file to the same directory as the onnx model and if not writable, use the system temporary directory (on the same drive as the onnx model files).
- fixed a bug when reusing the same backend variable for different filters
vsmlrt-cuda and model packages are identical to v4.
PS: we have successfully used both vs-ov and vs-trt in production anime encodings, so this release should be ready for production. As always, issues and suggestions welcome.
Update: turns out vs-ov is broken. The fix to openvino is not correctly picked up by the CI pipeline. Please use v6 for vs-ov.
v4 vs-trt support in vsmlrt.py, RealESRGANv2 model & full binary releases
This release introduces the following features:
- vsmlrt.py: added support for vs-trt (including transparent engine compilation)
- added RealESRGANv2 models, see https://github.com/AmusementClub/vs-mlrt/releases/download/model-20211209/RealESRGANv2_v1.7z
- full binary releases for Windows, which includes full set of models (waifu2x, RealESRGANv2 and DPIR) and all required DLLs. To simplify installation, we provide two variants:
- CPU only: vsmlrt-windows-x64-cpu.v4.7z
- CPU+CUDA: vsmlrt-windows-x64-cuda.v4.7z
To install, just extract them into your VSplugins
directory (preserving the existing directory structure within the 7z archive), and movevsmlrt.py
into your VS pythonsite-packages
directory and you're done.
Component Downloads
Besides the full releases, each individual component also has its own release, so that users can upgrade only what has been changed:
- models: full fp32 model release 20211209, includes waifu2x, RealESRGANv2 and DPIR.
- scripts: vsmlrt.py wrapper script, extract to VS python
site-packages
directory - vsmlrt-cuda: shared CUDA DLLs for vs-ort and vs-trt
- VSOV-Windows-x64: vs-ov plugin (pure CPU backend)
- VSORT-Windows-x64: vs-ort plugin, includes both CPU and CUDA backend; CUDA backend requires vsmlrt-cuda package.
- VSTRT-Windows-x64: vs-trt plugin, requires vsmlrt-cuda package.
All component packages should be extracted to your VS plugins
directory, except for scripts.v4.7z
, which needs to be extracted to VS python site-packages
directory.
Known Issues
- building TRT engine for waifu2x cunet and upresnet10 will fail on RTX 2000 and RTX 3000 series GPUs, please use vsort if you are using affected GPUs.
- due to the way NVidia DLLs are named, there might be DLL conflicts if you also have other AI filters (e.g. waifu2x caffe) in your plugins directory. Due to licensing restrictions and windows technical restrictions, there is no easy way to solve this DLL conflict problem. You will have to remove those conflicting plugins. Fortunately, the only affected plugin seems to be waifu2x caffe and we have already provided full functionality coverage and better performance with the vsmlrt.py script so there is no reason to use the caffe plugin anymore.
Installation Notes
- It is recommended to update to the latest GPU driver (e.g. >= v472.50) if you intend to use the CUDA backend of vsort or vstrt for best performance and compatibility; However, GeForce GPU users with GPU driver >= v452.39 should be able to use the CUDA backend.
- There are no changes to vsmlrt-cuda.7z from v3, so no need to re-download it if you already have it from v3.
v3 vs-ov/vs-ort/vs-trt interface overhaul & DPIR python wrapper
This release improves the interface of wrapper and plugins:
- The argument
pad
is renamed tooverlap
, and it is now possible to specify different overlap values on each direction. - The arguments
block_w
andblock_h
are merged into a single argumenttilesize
. - vsmlrt.py now supports DPIR models. The type of argument
backend
is changed to a typed data class. To use the plugin, you need to extract v3 DPIR model files into VSplugins\models
directory (please keep the directory structure inside the 7z archive intact while extracting.)
Built-in models can be found at model-20211209.
Example waifu2x wrapper usage:
from vsmlrt import Waifu2x, Waifu2xModel, Backend
src = core.std.BlankClip(format=vs.RGBS)
# backend could be:
# - CPU Backend.OV_CPU(): the recommended CPU backend; generally faster than ORT-CPU.
# - CPU Backend.ORT_CPU(num_streams=1, verbosity=2): vs-ort's cpu backend.
# - GPU Backend.ORT_CUDA(device_id=0, cudnn_benchmark=True, num_streams=1, verbosity=2)
# - use device_id to select device
# - set cudnn_benchmark=False to reduce script reload latency when debugging, but with slight throughput performance penalty.
flt = Waifu2x(src, noise=-1, scale=2, model=Waifu2xModel.upconv_7_anime_style_art_rgb, backend=Backend.ORT_CUDA())
Example DPIR wrapper usage:
from vsmlrt import DPIR, DPIRModel, Backend
src = core.std.BlankClip(format=vs.RGBS) # or vs.GRAYS for gray only models
# DPIR is a huge model and GPU backend is highly recommended.
# If the model runs out of GPU memory, increase the tiles parameter.
flt = DPIR(src, strength=5, model=DPIRModel.drunet_color, tiles=2, backend=Backend.ORT_CUDA())
Known Issues
- building TRT engine for waifu2x cunet and upresnet10 will fail on RTX 2000 and RTX 3000 series GPUs, please use vsort if you are using affected GPUs.
- due to the way NVidia DLLs are named, there might be DLL conflicts if you also have other AI filters (e.g. waifu2x caffe) in your plugins directory. Due to licensing restrictions and windows technical restrictions, there is no easy way to solve this DLL conflict problem. You will have to remove those conflicting plugins. Fortunately, the only affected plugin seems to be waifu2x caffe and we have already provided full functionality coverage and better performance with the vsmlrt.py script so there is no reason to use the caffe plugin anymore.
Installation Notes
- It is recommended to update to the latest GPU driver (e.g. >= v472.50) if you intend to use the CUDA backend of vsort or vstrt for best performance and compatibility; However, GeForce GPU users with GPU driver >= v452.39 should be able to use the CUDA backend.
- There are no changes to vsmlrt-cuda.7z from v2, so no need to re-download it if you already have it from v2.
v2 vs-trt public preview & vs-ov/vs-ort enhancements & waifu2x wrapper script
This release introduces the vs-trt plugin, which should provde the best possible performance on NVidia GPUs at the expense of requiring an extra tedious engine building step. vs-trt is only recommended for large AI models, e.g. DPIR. Smaller models like waifu2x won't see much performance benefits. Please refer to its docs for further usage instructions (and be forewarned, it's very hard to use unless you are prepared to spend some time understanding the process and doing some trial and error experiments.)
If you use GPU support for vsort or vstrt, then you also need to download and extract vsmlrt-cuda.7z into your VS plugins
directory (while keeping the directory structure inside the 7z files). The DLLs there will be shared by vsort and vstrt. Please also note that vstrt requires the use of new models released in model-20211209.
This release also introduces builtin model support for vsov and vsort (as vstrt requires building engine separately, builtin model support is moot.) You can place the model onnx files under VS plugins\models
directory, and set builtin=True
for vsov and vsort filters so that the network_path
argument is interpreted as a path relative to plugins\models
. This mode makes it easier to make a VS portable release with integrated models. For example, after extracting waifu2x-v3.7z into your VS plugins\models
directory (while keeping the directory structure inside the 7z files), you can use do this to use the waifu2x models with vsmlrt.py without worrying about their absolute paths:
from vsmlrt import Waifu2x, Waifu2xModel
src = core.std.BlankClip(format=vs.RGBS)
# backend could be: "ort-cpu", "ort-cuda", "ov-cpu"; suggested choice is "ov-cpu" for pure CPU and "ort-cuda" for GPU.
flt = Waifu2x(src, noise=-1, scale=2, model=Waifu2xModel.upconv_7_anime_style_art_rgb, backend="ort-cuda")
vsmlrt-cuda.7z Changelog
- added nvrtc for vstrt dynamic layer fusion support, only necessary if you use vstrt. If you only intend to use vsort, you can just download the smaller package vsmlrt-cuda-no-nvrtc.7z.
Known Issues
- building TRT engine for waifu2x cunet and upresnet10 will fail on RTX 2000 and RTX 3000 series GPUs, please use vsort if you are using affected GPUs.
- due to the way NVidia DLLs are named, there might be DLL conflicts if you also have other AI filters (e.g. waifu2x caffe) in your plugins directory. Due to licensing restrictions and windows technical restrictions, there is no easy way to solve this DLL conflict problem. You will have to remove those conflicting plugins. Fortunately, the only affected plugin seems to be waifu2x caffe and we have already provided full functionality coverage and better performance with the vsmlrt.py script so there is no reason to use the caffe plugin anymore.
Installation Notes
- please update to the latest GPU driver (e.g. >= v472.50) if you intend to use the CUDA backend of vsort or vstrt for best performance and compatibility.
- GeForce GPU users may use the v2b version of vsort which supports GPU driver >= v452.39.
Model release 20211209, new dynamic shape support
Model release 20211209
This requires plugin release v2 or above.
Users of v1 or v0 plugin releases please continue to use the previous release.
In general, we strive to keep previous model releases usable with newer plugin releases, but new model releases generally require newer plugin releases.
Changelog
- Modified input dimension to -1 to better support dynamic shapes and the upcoming vstrt plugin. vsov and vsort users can continue to use last release (though upgrading is highly recommended.)
- Added Real-CUGAN models
- Added cugan-pro and RealESRGANv3 models.
v1 vsort initial preview release
Initial public preview vs-ort release.
Changelog
- VSOV: moved tbb.dll into its own directory, so that we don't put any non-VS plugin DLL into the top level of the plugins directory.
- VSORT: initial release.
Installation Notes
- VSORT: ONNX Runtime
- CPU only: extract VSORT-Windows-x64.zip into
vapoursynth/plugins
directory. You can additionally removevsort/onnxruntime_providers_cuda.dll
andvsort/onnxruntime_providers_shared.dll
to save some disk space. - CUDA: extract both VSORT-Windows-x64.zip and vsmlrt-cuda.7z into
vapoursynth/plugins
directory.
- CPU only: extract VSORT-Windows-x64.zip into
- VSOV: just extract VSOV-Windows-x64.zip into
vapoursynth/plugins
directory.
Please note that the CUDA libraries are huge (requires ~1.9GiB space after extraction).
Please refer to the wiki for details.
v0 vsov initial preview release
This is the initial public preview release of vs-openvino.
It is built with statically linked openvino release at https://github.com/AmusementClub/openvino/releases/tag/2020.2-5484-gcccec6942.
Please see the wiki for details.