waifu2x

Waifu2x is a well-known image super-resolution neural network for anime-style arts.

Link:

(stable) https://github.com/AmusementClub/vs-mlrt/releases/download/model-20211209/waifu2x_v3.7z

Models

Includes all known publicly available waifu2x models:

anime_style_art: requires pre-scaled input for the scaled2.0x variant
- noise1 noise2 noise3 scale2.0x
anime_style_art_rgb: requires pre-scaled input for the scale2.0x variant
- noise0 noise1 noise2 noise3 scale2.0x
photo: requires pre-scaled input for the scale2.0x variant
- noise0 noise1 noise2 noise3 scale2.0x
ukbench: requires pre-scaled input
- scale2.0x
upconv_7_anime_style_art_rgb
- scale2.0x noise3_scale2.0x noise2_scale2.0x noise1_scale2.0x noise0_scale2.0x
upconv_7_photo
- scale2.0x noise0_scale2.0x noise1_scale2.0x noise2_scale2.0x noise3_scale2.0x
cunet: tile size (block_w and block_h) must be multiples of 4.
- noise0 noise1 noise2 noise3
- scale2.0x
- noise0_scale2.0x noise1_scale2.0x noise2_scale2.0x noise3_scale2.0x
upresnet10
- scale2.0x
- noise0_scale2.0x noise1_scale2.0x noise2_scale2.0x noise3_scale2.0x

`vsmlrt.py` wrapper Usage

In order to simplify usage, we provided a Python wrapper module vsmlrt that provides full functionality of waifu2x caffe but with a more Pythonic interface:

from vsmlrt import Waifu2x, Waifu2xModel, Backend

src = core.std.BlankClip(format=vs.RGBS)

# backend could be:
#  - CPU Backend.OV_CPU(): the recommended CPU backend; generally faster than ORT-CPU.
#  - CPU Backend.ORT_CPU(num_streams=1, verbosity=2): vs-ort cpu backend.
#  - GPU Backend.ORT_CUDA(device_id=0, cudnn_benchmark=True, num_streams=1, verbosity=2)
#     - use device_id to select device
#     - set cudnn_benchmark=False to reduce script reload latency when debugging, but with slight throughput performance penalty.
#  - GPU Backend.TRT(fp16=True, device_id=0, num_streams=1): TensorRT runtime, the fastest NV GPU runtime.
flt = Waifu2x(src, noise=-1, scale=2, model=Waifu2xModel.upconv_7_anime_style_art_rgb, backend=Backend.ORT_CUDA())

Raw Model Usage

This section is mostly for reference purposes as the suggested way is to use the vsmlrt.py.

src = core.std.BlankClip(width=1920, height=1080, format=vs.RGBS)
flt = core.ov.Model(src, "upconv_7_anime_style_art_rgb_scale2.0x.onnx")

anime_style_art, anime_style_art_rgb, photo, ukbench models do not include builtin upscaling. Therefore, you need to upscale 2x using Catmull-Rom (bicubic(b=0, c=0.5)) before feeding the image to the models:

src = core.std.BlankClip(width=1920, height=1080, format=vs.RGBS)
flt = core.ov.Model(src.fmtc.resample(scale=2, kernel="bicubic", a1=0, a2=0.5), "anime_style_art_rgb_scale2.0x.onnx")

Notes

cunet networks work best when the tile size (block_w/block_h) is in range 60 - 150 and multiples of 4.

Benchmarking

Measurements: FPS / Device Memory (GB)

Device memory:

CPU: private memory including VapourSynth
GPU: device memory including context

Tesla V100

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23

Input size: 1920x1080

Backends

vs-mlrt v6
VapourSynth-Waifu2x-caffe r14

Performance

FP32

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] caffe (540p patch)
upconv7	5.98 / 5065	6.60 / 5033	8.43 / 9253	1.63 / 3248
upresnet10	4.36 / 5061	N/A	N/A	1.54 / 7232
cunet	2.58 / 9155	N/A	N/A	1.11 / 11657

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)
upconv7	10.4 / 5189	13.8 / 3041	26.2 / 5253
upresnet10	6.43 / 5059	N/A	N/A
cunet	4.10 / 9535	N/A	N/A

Tesla A10

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23, lock the GPU clocks at max frequency.

Input size: 1920x1080

Backends

vs-mlrt v6

Performance

FP32

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)
upconv7	6.94 / 9765	7.83 / 5511	8.61 / 9731
upresnet10	3.90 / 5665	N/A	N/A
cunet	2.20 / 18469	N/A	N/A

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)
upconv7	9.66 / 6049	16.1 / 3501	19.9 / 5701
upresnet10	6.53 / 5663	N/A	N/A
cunet	3.26 / 10017	N/A	N/A

Icelake Server

Hardware: Xeon Icelake Server 32C64T @2.90 GHz

Software: VapourSynth R57, Windows Server 2019

Input size: 1920x1080

Backends

vs-mlrt v6
VapourSynth-Waifu2x-w2xc r8

Performance

FP32

Model	[1] ov-cpu	[2] w2xc
upconv7	1.14 / 15547	N/A
upresnet10	1.27 / 7245	N/A
cunet	0.57 / 10943	N/A
anime rgb	0.62 / 15578	0.048 / 1145

EPYC Milan

Hardware: EPYC Milan 16C32T @2.55 GHz

Software: VapourSynth R57, Windows Server 2019

Input size: 1920x1080

Backends

vs-mlrt v6
VapourSynth-Waifu2x-w2xc r8

Performance

FP32

Model	[1] ov-cpu	[2] w2xc
upconv7	0.37 / 8612	N/A
upresnet10	0.44 / 7143	N/A
cunet	0.23 / 10943	N/A
anime rgb	0.21 / 15439	0.039 / 1183

Home

Runtimes
Models
- waifu2x
- DPIR
- RealESRGANv2
- Real-CUGAN
- RIFE
- External models
Device-specific benchmarks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

waifu2x

Models

`vsmlrt.py` wrapper Usage

Raw Model Usage

Notes

Benchmarking

Tesla V100

Backends

Performance

FP32

FP16

Tesla A10

Backends

Performance

FP32

FP16

Icelake Server

Backends

Performance

FP32

EPYC Milan

Backends

Performance

FP32

Clone this wiki locally

waifu2x

Models

vsmlrt.py wrapper Usage

Raw Model Usage

Notes

Benchmarking

Tesla V100

Backends

Performance

FP32

FP16

Tesla A10

Backends

Performance

FP32

FP16

Icelake Server

Backends

Performance

FP32

EPYC Milan

Backends

Performance

FP32

Clone this wiki locally

`vsmlrt.py` wrapper Usage