waifu2x

Waifu2x is a well-known image super-resolution neural network for anime-style arts.

Link:

(stable) https://github.com/AmusementClub/vs-mlrt/releases/download/model-20211209/waifu2x_v3.7z

Models

Includes all known publicly available waifu2x models:

anime_style_art: requires pre-scaled input for the scaled2.0x variant
- noise1 noise2 noise3 scale2.0x
anime_style_art_rgb: requires pre-scaled input for the scale2.0x variant
- noise0 noise1 noise2 noise3 scale2.0x
photo: requires pre-scaled input for the scale2.0x variant
- noise0 noise1 noise2 noise3 scale2.0x
ukbench: requires pre-scaled input
- scale2.0x
upconv_7_anime_style_art_rgb
- scale2.0x noise3_scale2.0x noise2_scale2.0x noise1_scale2.0x noise0_scale2.0x
upconv_7_photo
- scale2.0x noise0_scale2.0x noise1_scale2.0x noise2_scale2.0x noise3_scale2.0x
cunet: tile size (block_w and block_h) must be multiples of 4.
- noise0 noise1 noise2 noise3
- scale2.0x
- noise0_scale2.0x noise1_scale2.0x noise2_scale2.0x noise3_scale2.0x
upresnet10
- scale2.0x
- noise0_scale2.0x noise1_scale2.0x noise2_scale2.0x noise3_scale2.0x

`vsmlrt.py` wrapper Usage

In order to simplify usage, we provided a Python wrapper module vsmlrt that provides full functionality of waifu2x caffe but with a more Pythonic interface:

from vsmlrt import Waifu2x, Waifu2xModel, Backend

src = core.std.BlankClip(format=vs.RGBS)

# backend could be:
#  - CPU Backend.OV_CPU(): the recommended CPU backend; generally faster than ORT-CPU.
#  - CPU Backend.ORT_CPU(num_streams=1, verbosity=2): vs-ort cpu backend.
#  - GPU Backend.ORT_CUDA(device_id=0, cudnn_benchmark=True, num_streams=1, verbosity=2)
#     - use device_id to select device
#     - set cudnn_benchmark=False to reduce script reload latency when debugging, but with slight throughput performance penalty.
#  - GPU Backend.TRT(fp16=True, device_id=0, num_streams=1): TensorRT runtime, the fastest NV GPU runtime.
flt = Waifu2x(src, noise=-1, scale=2, model=Waifu2xModel.upconv_7_anime_style_art_rgb, backend=Backend.ORT_CUDA())

Raw Model Usage

This section is mostly for reference purposes as the suggested way is to use the vsmlrt.py.

src = core.std.BlankClip(width=1920, height=1080, format=vs.RGBS)
flt = core.ov.Model(src, "upconv_7_anime_style_art_rgb_scale2.0x.onnx")

anime_style_art, anime_style_art_rgb, photo, ukbench models do not include builtin upscaling. Therefore, you need to upscale 2x using Catmull-Rom (bicubic(b=0, c=0.5)) before feeding the image to the models:

src = core.std.BlankClip(width=1920, height=1080, format=vs.RGBS)
flt = core.ov.Model(src.fmtc.resample(scale=2, kernel="bicubic", a1=0, a2=0.5), "anime_style_art_rgb_scale2.0x.onnx")

Notes

cunet networks work best when the tile size (block_w/block_h) is in range 60 - 150 and multiples of 4.

Benchmarking

Measurements: FPS / Device Memory (GB)

Device memory:

CPU: private memory including VapourSynth
GPU: device memory including context

Tesla V100

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23

Input size: 1920x1080

Backends

vs-mlrt v6
VapourSynth-Waifu2x-caffe r14
vapoursynth-waifu2x-ncnn-vulkan r4, Graphics Driver 471.68

Performance

FP32

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] caffe (540p patch)	[3] vulkan (540p patch)
upconv7	5.98 / 5065	6.60 / 5033	8.43 / 9253	1.63 / 3248	1.67 / 11197
upresnet10	4.36 / 5061	N/A	N/A	1.54 / 7232	N/A
cunet	2.58 / 9155	N/A	N/A	1.11 / 11657	0.53 / 15705

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[3] vulkan
upconv7	10.4 / 5189	13.8 / 3041	26.2 / 5253	3.97 / 21369
upresnet10	6.43 / 5059	N/A	N/A	N/A
cunet	4.10 / 9535	N/A	N/A	0.86 / 29848

Tesla A10

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23, lock the GPU clocks at max frequency.

Input size: 1920x1080

Backends

vs-mlrt v6
vapoursynth-waifu2x-ncnn-vulkan r4, Graphics Driver 471.68

Performance

FP32

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] vulkan (540p patch)
upconv7	6.94 / 9765	7.83 / 5511	8.61 / 9731	1.63 / 10892
upresnet10	3.90 / 5665	N/A	N/A	N/A
cunet	2.20 / 18469	N/A	N/A	0.53 / 15397

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] vulkan
upconv7	9.66 / 6049	16.1 / 3501	19.9 / 5701	3.03 / 21075
upresnet10	6.53 / 5663	N/A	N/A	N/A
cunet	3.26 / 10017	N/A	N/A	0.78 / 8011 (540p patch)

Icelake Server

Hardware: Xeon Icelake Server 32C64T @2.90 GHz

Software: VapourSynth R57, Windows Server 2019

Input size: 1920x1080

Backends

vs-mlrt v6
VapourSynth-Waifu2x-w2xc r8

Performance

FP32

Model	[1] ov-cpu	[2] w2xc
upconv7	1.22 / 18750	N/A
upresnet10	1.40 / 18278	N/A
cunet	0.65 / 22447	N/A
anime rgb	0.69 / 34619	0.26 / 7895

EPYC Milan

Hardware: EPYC Milan 32C64T @2.55 GHz

Software: VapourSynth R57, Windows Server 2019

Input size: 1920x1080

Backends

vs-mlrt v6
VapourSynth-Waifu2x-w2xc r8

Performance

FP32

Model	[1] ov-cpu	[2] w2xc
upconv7	0.36 / 19583	N/A
upresnet10	0.35 / 18694	N/A
cunet	0.20 / 21644	N/A
anime rgb	0.20 / 34619	0.28 / 5398

Home

Runtimes
Models
- waifu2x
- DPIR
- RealESRGANv2
- Real-CUGAN
- RIFE
- External models
Device-specific benchmarks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

waifu2x

Models

`vsmlrt.py` wrapper Usage

Raw Model Usage

Notes

Benchmarking

Tesla V100

Backends

Performance

FP32

FP16

Tesla A10

Backends

Performance

FP32

FP16

Icelake Server

Backends

Performance

FP32

EPYC Milan

Backends

Performance

FP32

Clone this wiki locally

waifu2x

Models

vsmlrt.py wrapper Usage

Raw Model Usage

Notes

Benchmarking

Tesla V100

Backends

Performance

FP32

FP16

Tesla A10

Backends

Performance

FP32

FP16

Icelake Server

Backends

Performance

FP32

EPYC Milan

Backends

Performance

FP32

Clone this wiki locally

`vsmlrt.py` wrapper Usage