LAN - Lightweight Attention-based Network for Smartphone Image Processing

Overview

This repository provides the code used for training and evaluating the LAN CNN. The model was designed to reproduce high-quality RGB images from the Bayer-Filtered RAW output of a smartphone sensor. This can completely replace the hand-crafted Image Signal Processing (ISP) pipelines encountered in digital cameras by a single deep learning model. The model is trained on pairs of images generated with the Sony IMX586 camera sensor and the Fujifilm GFX100 DSLR camera.

Requirements

imageio=2.9.0 for loading .png images
numpy=1.21.2 for general matrix operations
pillow=8.3.1 for image resizing operations
rawpy=0.16.0 for loading .raw images
six=1.16.0 for downloads
tensorflow-gpu=2.3.0 for the whole NN training and inference
tqdm=4.62.1 for nice progress bars

First steps

Download the dataset from the MAI'21 Learned Smartphone ISP Challenge website (registration needed). _{The dataset directory (default name: raw_images/) should contain three subfolders: train/, val/ and test/.}
Download the pre-trained VGG-19 model ^Mirror and put it into the vgg_pretrained/ folder created at the root of the directory.

Training

The train_model.py file can be invoked as follows:

python train_model.py <args>

where args are defined in utils.py, and can be one of the following (default values in bold) :

dataset_dir: raw_images/ - path to the folder with the dataset
vgg_dir: vgg_pretrained/imagenet-vgg-verydeep-19.mat - path to the pre-trained VGG-19 network
dslr_dir: fujifilm/ - path to the folder with the RGB data
phone_dir: mediatek_raw/ - path to the folder with the Raw data
model_dir: models/ - path to the folder with the model to be restored or saved

restore_iter: None - iteration to restore, defaults to last iteration
patch_w: 256 - width of the training images
patch_h: 256 - height of the training images

batch_size: 32 - batch size [small values can lead to unstable training]
train_size: 5000 - the number of training patches randomly loaded each 1000 iterations
learning_rate: 5e-5 - learning rate
eval_step: 1000 - each eval_step iterations the accuracy is computed and the model is saved
num_train_iters: 100000 - the number of training iterations
optimizer: radam - the optimizer used (adam is the other option)

the loss function can also be built with the following losses:

fac_mse: 0 - Mean-Squared-Error
fac_l1: 0 - Mean-Absolute-Error
fac_ssim: 0 - Structural Similarity Index
fac_ms_ssim: 30 - Multi-Scale Structural Similarity Index
fac_uv: 100 - Loss between blurred UV channels (color loss)
fac_vgg: 0 - VGG loss (perceptual loss)
fac_lpips: 10 - LPIPS loss (perceptual loss)
fac_huber: 300 - Huber loss
fac_charbonnier: 0 - Charbonnier loss (approximation of L1 loss)

Inference - Full-Resolution Images

The test_model.py file can be invoked as follows:

python test_model.py <args>

where args are defined in utils.py, and can be one of the following (default values in bold) :

dataset_dir: raw_images/ - path to the folder with the dataset
result_dir: <model_dir> - output images are saved under results/full-resolution/<result_dir>
phone_dir: mediatek_raw_normal/ - path to the folder with the Raw data
model_dir: models/ - path to the folder with the model to be restored or saved

restore_iter: None - iteration to restore, defaults to last iteration
img_w: 3000 - width of the full-resolution images
img_h: 4000 - height of the full-resolution images
use_gpu: True - use the GPU for inference

Inference - Numerical Evaluation

The evaluate_model.py file can be invoked as follows:

python evaluate_model.py <args>

where args are defined in utils.py, and can be one of the following (default values in bold) :

dataset_dir: raw_images/ - path to the folder with the dataset
vgg_dir: vgg_pretrained/imagenet-vgg-verydeep-19.mat - path to the pre-trained VGG-19 network
dslr_dir: fujifilm/ - path to the folder with the RGB data
phone_dir: mediatek_raw/ - path to the folder with the Raw data
model_dir: models/ - path to the folder with the model to be restored or saved

restore_iter: None - iteration to restore, defaults to last iteration
img_w: 256 - width of the evaluation patches
img_h: 256 - height of the evaluation patches
use_gpu: True - use the GPU for inference
batch_size: 10 - batch size

Acknowledgements

This repo bases on the mai21-learned-smartphone-isp repository;
The LPIPS loss function was implemented using the code from alexlee-gk;
The RAdam optimizer was implemented using the code from taki0112.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
models		models
readme_files		readme_files
results/full_resolution		results/full_resolution
vgg_pretrained		vgg_pretrained
LICENSE		LICENSE
RAdam.py		RAdam.py
README.md		README.md
evaluate_model.py		evaluate_model.py
load_dataset.py		load_dataset.py
lpips_tf.py		lpips_tf.py
model.py		model.py
test_model.py		test_model.py
train_model.py		train_model.py
utils.py		utils.py
vgg.py		vgg.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LAN - Lightweight Attention-based Network for Smartphone Image Processing

Overview

Contents

Requirements

First steps

Training

Inference - Full-Resolution Images

Inference - Numerical Evaluation

Acknowledgements

About

Releases

Packages

Languages

License

draimundo/LAN

Folders and files

Latest commit

History

Repository files navigation

LAN - Lightweight Attention-based Network for Smartphone Image Processing

Overview

Contents

Requirements

First steps

Training

Inference - Full-Resolution Images

Inference - Numerical Evaluation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages