Skip to content

draimundo/LAN

Repository files navigation

LAN - Lightweight Attention-based Network for Smartphone Image Processing

Overview

This repository provides the code used for training and evaluating the LAN CNN. The model was designed to reproduce high-quality RGB images from the Bayer-Filtered RAW output of a smartphone sensor. This can completely replace the hand-crafted Image Signal Processing (ISP) pipelines encountered in digital cameras by a single deep learning model. The model is trained on pairs of images generated with the Sony IMX586 camera sensor and the Fujifilm GFX100 DSLR camera.

RAW-to-RGB example

LAN model

Contents

Requirements

  • imageio=2.9.0 for loading .png images
  • numpy=1.21.2 for general matrix operations
  • pillow=8.3.1 for image resizing operations
  • rawpy=0.16.0 for loading .raw images
  • six=1.16.0 for downloads
  • tensorflow-gpu=2.3.0 for the whole NN training and inference
  • tqdm=4.62.1 for nice progress bars

First steps

  • Download the dataset from the MAI'21 Learned Smartphone ISP Challenge website (registration needed). The dataset directory (default name: raw_images/) should contain three subfolders: train/, val/ and test/.
  • Download the pre-trained VGG-19 model Mirror and put it into the vgg_pretrained/ folder created at the root of the directory.

Training

The train_model.py file can be invoked as follows:

python train_model.py <args>

where args are defined in utils.py, and can be one of the following (default values in bold) :

dataset_dir: raw_images/   -   path to the folder with the dataset
vgg_dir: vgg_pretrained/imagenet-vgg-verydeep-19.mat   -   path to the pre-trained VGG-19 network
dslr_dir: fujifilm/   -   path to the folder with the RGB data
phone_dir: mediatek_raw/   -   path to the folder with the Raw data
model_dir: models/   -   path to the folder with the model to be restored or saved

restore_iter: None   -   iteration to restore, defaults to last iteration
patch_w: 256   -   width of the training images
patch_h: 256   -   height of the training images

batch_size: 32   -   batch size [small values can lead to unstable training]
train_size: 5000   -   the number of training patches randomly loaded each 1000 iterations
learning_rate: 5e-5   -   learning rate
eval_step: 1000   -   each eval_step iterations the accuracy is computed and the model is saved
num_train_iters: 100000   -   the number of training iterations
optimizer: radam   -   the optimizer used (adam is the other option)

the loss function can also be built with the following losses:

fac_mse: 0   -   Mean-Squared-Error
fac_l1: 0   -   Mean-Absolute-Error
fac_ssim: 0   -   Structural Similarity Index
fac_ms_ssim: 30   -   Multi-Scale Structural Similarity Index
fac_uv: 100   -   Loss between blurred UV channels (color loss)
fac_vgg: 0   -   VGG loss (perceptual loss)
fac_lpips: 10   -   LPIPS loss (perceptual loss)
fac_huber: 300   -   Huber loss
fac_charbonnier: 0   -   Charbonnier loss (approximation of L1 loss)

Inference - Full-Resolution Images

The test_model.py file can be invoked as follows:

python test_model.py <args>

where args are defined in utils.py, and can be one of the following (default values in bold) :

dataset_dir: raw_images/   -   path to the folder with the dataset
result_dir: <model_dir>   -   output images are saved under results/full-resolution/<result_dir>
phone_dir: mediatek_raw_normal/   -   path to the folder with the Raw data
model_dir: models/   -   path to the folder with the model to be restored or saved

restore_iter: None   -   iteration to restore, defaults to last iteration
img_w: 3000   -   width of the full-resolution images
img_h: 4000   -   height of the full-resolution images
use_gpu: True   -   use the GPU for inference

Inference - Numerical Evaluation

The evaluate_model.py file can be invoked as follows:

python evaluate_model.py <args>

where args are defined in utils.py, and can be one of the following (default values in bold) :

dataset_dir: raw_images/   -   path to the folder with the dataset
vgg_dir: vgg_pretrained/imagenet-vgg-verydeep-19.mat   -   path to the pre-trained VGG-19 network
dslr_dir: fujifilm/   -   path to the folder with the RGB data
phone_dir: mediatek_raw/   -   path to the folder with the Raw data
model_dir: models/   -   path to the folder with the model to be restored or saved

restore_iter: None   -   iteration to restore, defaults to last iteration
img_w: 256   -   width of the evaluation patches
img_h: 256   -   height of the evaluation patches
use_gpu: True   -   use the GPU for inference
batch_size: 10   -   batch size

Acknowledgements

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages