Skip to content

Latest commit

 

History

History
122 lines (82 loc) · 5.06 KB

ipt.md

File metadata and controls

122 lines (82 loc) · 5.06 KB

Contents

This repository is an official implementation of the paper "Pre-Trained Image Processing Transformer" from CVPR 2021.

We study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks.

If you find our work useful in your research or publication, please cite our work: [1] Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and Wen Gao. "Pre-trained image processing transformer". CVPR 2021. [arXiv]

@inproceedings{chen2020pre,
  title={Pre-trained image processing transformer},
  author={Chen, Hanting and Wang, Yunhe and Guo, Tianyu and Xu, Chang and Deng, Yiping and Liu, Zhenhua and Ma, Siwei and Xu, Chunjing and Xu, Chao and Gao, Wen},
  booktitle={CVPR},
  year={2021}
 }

The overall network architecture of IPT is shown as below

architecture

The benchmark datasets can be downloaded as follows:

For super-resolution:

Set5, Set14, B100, Urban100.

For denoising:

CBSD68.

For deraining:

Rain100L

The result images are converted into YCbCr color space. The PSNR is evaluated on the Y channel only.

Hardware (Ascend)

Prepare hardware environment with Ascend.

Framework

MindSpore>=1.9

For more information, please check the resources below

MindSpore Tutorials MindSpore Python API

Training speed.

At mindspore, dynamic shape will take additional costs. Especially on two image with a large difference in size.

Platform Device type Device training time for each step(seconds)
IPT-MS-Ascend Ascend Ascend-910A 0.159 (batch size: 32)

The Results on all tasks are listed as below.

Super-resolution results:

Scale Set5 Set14 B100 Urban100
×2 38.33 34.49 32.46 33.74
×3 34.86 30.85 29.38 29.50
×4 32.71 29.03 27.84 27.24

Denoising results:

noisy level CBSD68 Urban100
30 32.35 33.99
50 29.93 31.49

Derain results:

Task Rain100L
Derain 42.08

Inference Speed on Super-resolution (x4) task:

Platform Device type Device Inference time for images in set5 dataset (seconds)
IPT-Torch GPU V100 1.33, 0.69, 0.59, 0.69, 1.09
IPT-MS (mindspore 1.7) GPU V100 39.29, 13.45, 0.69, 0.71, 14.01
IPT-MS-Ascend (mindspore 1.9, Graph Mode) Ascend Ascend-910A 419.53, 254.83, 9.49, 22.68, 275.21
IPT-MS-Ascend (mindspore 1.9, Pynative Mode) Ascend Ascend-910A 635.53, 195.69, 14.94, 15.07, 271.17
IPT-MS-Ascend (mindspore 1.9, Graph Mode, warmup 50 exps) Ascend Ascend-910A 2.74, 2.68, 2.47, 2.38, 2.56
IPT-MS-Ascend (mindspore 1.9, Pynative Mode, warmup 50 exps) Ascend Ascend-910A 4.43, 4.10, 4.44, 4.03, 4.11