Skip to content

Releases: AmusementClub/vs-mlrt

v8: latest CUDA libraries and ~10% faster

12 Mar 06:43
Compare
Choose a tag to compare
  • This release upgrades the cuda libraries to their latest version. Models are observed to be accelerated by ~1.1x.
  • vsmlrt.CUGAN() now accepts a new parameter alpha, which controls the strength of filtering. Setting alpha to non-default values requires the Python onnx package (but this might change in the future.)
  • Added tf32 parameter to the trt backend in vsmlrt.py. TF32 acceleration is enabled by default on the Ampere GPUs, mostly for fp32 inference, and it has no effect on other architectures.

v7: add Real-CUGAN

27 Jan 08:22
Compare
Choose a tag to compare

This release adds support for bilibili's Real-CUGAN, please refer to the wiki for details.

Special notes for CUGAN:

  1. Make sure the RGBS input to CUGAN is within [0,1] range (if in doubt, better to use core.std.Expr(input, "x 0 max 1 min") to condition the input before feeding the NN; fmtconv YUV2RGB might generate out of range RGB values): Out of range values will trip the NN into producing large negative values.
  2. Do not use tiling (i.e. must set tiles=1) as CUGAN requires access to the entire input frame for its depth detection mechanism to work.

Compared to v6, only scripts.v7.7z, models.v7.7z, vsmlrt-windows-x64-cpu.v7.7z and vsmlrt-windows-x64-cuda.v7.7z files are updated.

v6 further performance optimizations of vs-trt and vs-ov&vs-ort bugfix

20 Jan 14:31
Compare
Choose a tag to compare

This release contains some performance optimization of the vs-trt plugin. The general takeaway is that vs-trt can beat all benchmarked solutions on DPIR, waifu2x and RealESRGANv2 models. Specific highlights are as follows:

  • waifu2x: when using CPU, vs-ov beats waifu2x-w2xc by 2.7x (Intel 32C64T); when using GPU, vs-ort/vs-trt beats vulkan-ncnn by ~4x.
  • DPIR: vs-trt beats existing implementations on both Volta (Tesla V100) and Ampere (A10) platforms (by at most 1.5x), and vs-ort saves significant amount of GPU memory (by as much as 3.7x) compared to its counterpart
  • RealESRGANv2: vs-trt, being the only backend that utilizes TensorRT, is up to 3.3x faster than the reference implementation

Please see detailed benchmark results in the wiki:

This release also fixed the following two bugs:

  • vs-ov: some openvino error messages from openvino were sent to stdout, affecting vspipe | x265 usage.
  • vs-ort/vs-ov: error in converting RealESRGANv2 model to fp16 format.

v5 fp16 support & production ready!

30 Dec 14:14
Compare
Choose a tag to compare

Changelog:

  1. added fp16 support to vs-ov and vs-ort (input model is still fp32, and these filters will convert it to fp16 on the fly). Now all three backends support inference with fp16 (though using fp16 mainly benefit vs-ort's CUDA backend).
  2. fixed vs-ov spurious logging messages to stdout which interferes with vspipe | x265 pipeline (requires patched openvino) Turns out the fix is not picked by the CI. Please use v6 for vs-ov.
  3. changes to the vs-trt backend vsmlrt.Backend.TRT() of the vsmlrt.py wrapper
    • max_shapes defaults to tile size now (as tensorrt GPU memory usage is related to max_shapes rather than the actual shape used in inference, this should help saving GPU memory);
    • the default opt_shapes is None now, which means it will be set to the actual tilesize in use: this is especially beneficial for large models like DPIR. If you prefer faster engine build times, you should set opt_shapes=(64, 64) to restore previous behavior. This change also makes it easier to use the tiles parameter (as in this case, you generally don't know the exact inference shape)
    • changed default cache & engine directory: first try saving the engine and cache file to the same directory as the onnx model and if not writable, use the system temporary directory (on the same drive as the onnx model files).
    • fixed a bug when reusing the same backend variable for different filters

vsmlrt-cuda and model packages are identical to v4.

PS: we have successfully used both vs-ov and vs-trt in production anime encodings, so this release should be ready for production. As always, issues and suggestions welcome.
Update: turns out vs-ov is broken. The fix to openvino is not correctly picked up by the CI pipeline. Please use v6 for vs-ov.

v4 vs-trt support in vsmlrt.py, RealESRGANv2 model & full binary releases

17 Dec 12:26
Compare
Choose a tag to compare

This release introduces the following features:

Component Downloads

Besides the full releases, each individual component also has its own release, so that users can upgrade only what has been changed:

  • models: full fp32 model release 20211209, includes waifu2x, RealESRGANv2 and DPIR.
  • scripts: vsmlrt.py wrapper script, extract to VS python site-packages directory
  • vsmlrt-cuda: shared CUDA DLLs for vs-ort and vs-trt
  • VSOV-Windows-x64: vs-ov plugin (pure CPU backend)
  • VSORT-Windows-x64: vs-ort plugin, includes both CPU and CUDA backend; CUDA backend requires vsmlrt-cuda package.
  • VSTRT-Windows-x64: vs-trt plugin, requires vsmlrt-cuda package.

All component packages should be extracted to your VS plugins directory, except for scripts.v4.7z, which needs to be extracted to VS python site-packages directory.

Known Issues

  1. building TRT engine for waifu2x cunet and upresnet10 will fail on RTX 2000 and RTX 3000 series GPUs, please use vsort if you are using affected GPUs.
  2. due to the way NVidia DLLs are named, there might be DLL conflicts if you also have other AI filters (e.g. waifu2x caffe) in your plugins directory. Due to licensing restrictions and windows technical restrictions, there is no easy way to solve this DLL conflict problem. You will have to remove those conflicting plugins. Fortunately, the only affected plugin seems to be waifu2x caffe and we have already provided full functionality coverage and better performance with the vsmlrt.py script so there is no reason to use the caffe plugin anymore.

Installation Notes

  1. It is recommended to update to the latest GPU driver (e.g. >= v472.50) if you intend to use the CUDA backend of vsort or vstrt for best performance and compatibility; However, GeForce GPU users with GPU driver >= v452.39 should be able to use the CUDA backend.
  2. There are no changes to vsmlrt-cuda.7z from v3, so no need to re-download it if you already have it from v3.

v3 vs-ov/vs-ort/vs-trt interface overhaul & DPIR python wrapper

16 Dec 08:14
Compare
Choose a tag to compare

This release improves the interface of wrapper and plugins:

  • The argument pad is renamed to overlap, and it is now possible to specify different overlap values on each direction.
  • The arguments block_w and block_h are merged into a single argument tilesize.
  • vsmlrt.py now supports DPIR models. The type of argument backend is changed to a typed data class. To use the plugin, you need to extract v3 DPIR model files into VS plugins\models directory (please keep the directory structure inside the 7z archive intact while extracting.)

Built-in models can be found at model-20211209.

Example waifu2x wrapper usage:

from vsmlrt import Waifu2x, Waifu2xModel, Backend

src = core.std.BlankClip(format=vs.RGBS)

# backend could be:
#  - CPU Backend.OV_CPU(): the recommended CPU backend; generally faster than ORT-CPU.
#  - CPU Backend.ORT_CPU(num_streams=1, verbosity=2): vs-ort's cpu backend.
#  - GPU Backend.ORT_CUDA(device_id=0, cudnn_benchmark=True, num_streams=1, verbosity=2)
#     - use device_id to select device
#     - set cudnn_benchmark=False to reduce script reload latency when debugging, but with slight throughput performance penalty.
flt = Waifu2x(src, noise=-1, scale=2, model=Waifu2xModel.upconv_7_anime_style_art_rgb, backend=Backend.ORT_CUDA())

Example DPIR wrapper usage:

from vsmlrt import DPIR, DPIRModel, Backend
src = core.std.BlankClip(format=vs.RGBS) # or vs.GRAYS for gray only models
# DPIR is a huge model and GPU backend is highly recommended.
# If the model runs out of GPU memory, increase the tiles parameter.
flt = DPIR(src, strength=5, model=DPIRModel.drunet_color, tiles=2, backend=Backend.ORT_CUDA())

Known Issues

  1. building TRT engine for waifu2x cunet and upresnet10 will fail on RTX 2000 and RTX 3000 series GPUs, please use vsort if you are using affected GPUs.
  2. due to the way NVidia DLLs are named, there might be DLL conflicts if you also have other AI filters (e.g. waifu2x caffe) in your plugins directory. Due to licensing restrictions and windows technical restrictions, there is no easy way to solve this DLL conflict problem. You will have to remove those conflicting plugins. Fortunately, the only affected plugin seems to be waifu2x caffe and we have already provided full functionality coverage and better performance with the vsmlrt.py script so there is no reason to use the caffe plugin anymore.

Installation Notes

  1. It is recommended to update to the latest GPU driver (e.g. >= v472.50) if you intend to use the CUDA backend of vsort or vstrt for best performance and compatibility; However, GeForce GPU users with GPU driver >= v452.39 should be able to use the CUDA backend.
  2. There are no changes to vsmlrt-cuda.7z from v2, so no need to re-download it if you already have it from v2.

v2 vs-trt public preview & vs-ov/vs-ort enhancements & waifu2x wrapper script

10 Dec 01:35
Compare
Choose a tag to compare

This release introduces the vs-trt plugin, which should provde the best possible performance on NVidia GPUs at the expense of requiring an extra tedious engine building step. vs-trt is only recommended for large AI models, e.g. DPIR. Smaller models like waifu2x won't see much performance benefits. Please refer to its docs for further usage instructions (and be forewarned, it's very hard to use unless you are prepared to spend some time understanding the process and doing some trial and error experiments.)

If you use GPU support for vsort or vstrt, then you also need to download and extract vsmlrt-cuda.7z into your VS plugins directory (while keeping the directory structure inside the 7z files). The DLLs there will be shared by vsort and vstrt. Please also note that vstrt requires the use of new models released in model-20211209.

This release also introduces builtin model support for vsov and vsort (as vstrt requires building engine separately, builtin model support is moot.) You can place the model onnx files under VS plugins\models directory, and set builtin=True for vsov and vsort filters so that the network_path argument is interpreted as a path relative to plugins\models. This mode makes it easier to make a VS portable release with integrated models. For example, after extracting waifu2x-v3.7z into your VS plugins\models directory (while keeping the directory structure inside the 7z files), you can use do this to use the waifu2x models with vsmlrt.py without worrying about their absolute paths:

from vsmlrt import Waifu2x, Waifu2xModel

src = core.std.BlankClip(format=vs.RGBS)
# backend could be: "ort-cpu", "ort-cuda", "ov-cpu"; suggested choice is "ov-cpu" for pure CPU and "ort-cuda" for GPU.
flt = Waifu2x(src, noise=-1, scale=2, model=Waifu2xModel.upconv_7_anime_style_art_rgb, backend="ort-cuda")

vsmlrt-cuda.7z Changelog

  1. added nvrtc for vstrt dynamic layer fusion support, only necessary if you use vstrt. If you only intend to use vsort, you can just download the smaller package vsmlrt-cuda-no-nvrtc.7z.

Known Issues

  1. building TRT engine for waifu2x cunet and upresnet10 will fail on RTX 2000 and RTX 3000 series GPUs, please use vsort if you are using affected GPUs.
  2. due to the way NVidia DLLs are named, there might be DLL conflicts if you also have other AI filters (e.g. waifu2x caffe) in your plugins directory. Due to licensing restrictions and windows technical restrictions, there is no easy way to solve this DLL conflict problem. You will have to remove those conflicting plugins. Fortunately, the only affected plugin seems to be waifu2x caffe and we have already provided full functionality coverage and better performance with the vsmlrt.py script so there is no reason to use the caffe plugin anymore.

Installation Notes

  1. please update to the latest GPU driver (e.g. >= v472.50) if you intend to use the CUDA backend of vsort or vstrt for best performance and compatibility.
  2. GeForce GPU users may use the v2b version of vsort which supports GPU driver >= v452.39.

Model release 20211209, new dynamic shape support

09 Dec 05:48
Compare
Choose a tag to compare

Model release 20211209

This requires plugin release v2 or above.
Users of v1 or v0 plugin releases please continue to use the previous release.

In general, we strive to keep previous model releases usable with newer plugin releases, but new model releases generally require newer plugin releases.

Changelog

  1. Modified input dimension to -1 to better support dynamic shapes and the upcoming vstrt plugin. vsov and vsort users can continue to use last release (though upgrading is highly recommended.)
  2. Added Real-CUGAN models
  3. Added cugan-pro and RealESRGANv3 models.

v1 vsort initial preview release

04 Dec 11:42
Compare
Choose a tag to compare

Initial public preview vs-ort release.

Changelog

  1. VSOV: moved tbb.dll into its own directory, so that we don't put any non-VS plugin DLL into the top level of the plugins directory.
  2. VSORT: initial release.

Installation Notes

  • VSORT: ONNX Runtime
    • CPU only: extract VSORT-Windows-x64.zip into vapoursynth/plugins directory. You can additionally remove vsort/onnxruntime_providers_cuda.dll and vsort/onnxruntime_providers_shared.dll to save some disk space.
    • CUDA: extract both VSORT-Windows-x64.zip and vsmlrt-cuda.7z into vapoursynth/plugins directory.
  • VSOV: just extract VSOV-Windows-x64.zip into vapoursynth/plugins directory.

Please note that the CUDA libraries are huge (requires ~1.9GiB space after extraction).

Please refer to the wiki for details.

v0 vsov initial preview release

03 Dec 04:34
Compare
Choose a tag to compare

This is the initial public preview release of vs-openvino.

It is built with statically linked openvino release at https://github.com/AmusementClub/openvino/releases/tag/2020.2-5484-gcccec6942.

Please see the wiki for details.