This repository provides the code used for training and evaluating the LAN CNN. The model was designed to reproduce high-quality RGB images from the Bayer-Filtered RAW output of a smartphone sensor. This can completely replace the hand-crafted Image Signal Processing (ISP) pipelines encountered in digital cameras by a single deep learning model. The model is trained on pairs of images generated with the Sony IMX586 camera sensor and the Fujifilm GFX100 DSLR camera.
- Overview
- Requirements
- First steps
- Training
- Inference - Full-Resolution Images
- Inference - Numerical Evaluation
- Acknowledgements
- imageio=2.9.0 for loading .png images
- numpy=1.21.2 for general matrix operations
- pillow=8.3.1 for image resizing operations
- rawpy=0.16.0 for loading .raw images
- six=1.16.0 for downloads
- tensorflow-gpu=2.3.0 for the whole NN training and inference
- tqdm=4.62.1 for nice progress bars
- Download the dataset from the MAI'21 Learned Smartphone ISP Challenge website (registration needed).
The dataset directory (default name:
raw_images/
) should contain three subfolders:train/
,val/
andtest/
. - Download the pre-trained VGG-19 model Mirror and put it into the
vgg_pretrained/
folder created at the root of the directory.
The train_model.py
file can be invoked as follows:
python train_model.py <args>
where args
are defined in utils.py
, and can be one of the following (default values in bold
) :
dataset_dir
:raw_images/
- path to the folder with the dataset
vgg_dir
:vgg_pretrained/imagenet-vgg-verydeep-19.mat
- path to the pre-trained VGG-19 network
dslr_dir
:fujifilm/
- path to the folder with the RGB data
phone_dir
:mediatek_raw/
- path to the folder with the Raw data
model_dir
:models/
- path to the folder with the model to be restored or saved
restore_iter
:None
- iteration to restore, defaults to last iteration
patch_w
:256
- width of the training images
patch_h
:256
- height of the training images
batch_size
:32
- batch size [small values can lead to unstable training]
train_size
:5000
- the number of training patches randomly loaded each 1000 iterations
learning_rate
:5e-5
- learning rate
eval_step
:1000
- eacheval_step
iterations the accuracy is computed and the model is saved
num_train_iters
:100000
- the number of training iterations
optimizer
:radam
- the optimizer used (adam
is the other option)
the loss function can also be built with the following losses:
fac_mse
:0
- Mean-Squared-Error
fac_l1
:0
- Mean-Absolute-Error
fac_ssim
:0
- Structural Similarity Index
fac_ms_ssim
:30
- Multi-Scale Structural Similarity Index
fac_uv
:100
- Loss between blurred UV channels (color loss)
fac_vgg
:0
- VGG loss (perceptual loss)
fac_lpips
:10
- LPIPS loss (perceptual loss)
fac_huber
:300
- Huber loss
fac_charbonnier
:0
- Charbonnier loss (approximation of L1 loss)
The test_model.py
file can be invoked as follows:
python test_model.py <args>
where args
are defined in utils.py
, and can be one of the following (default values in bold
) :
dataset_dir
:raw_images/
- path to the folder with the dataset
result_dir
:<model_dir>
- output images are saved underresults/full-resolution/<result_dir>
phone_dir
:mediatek_raw_normal/
- path to the folder with the Raw data
model_dir
:models/
- path to the folder with the model to be restored or saved
restore_iter
:None
- iteration to restore, defaults to last iteration
img_w
:3000
- width of the full-resolution images
img_h
:4000
- height of the full-resolution images
use_gpu
:True
- use the GPU for inference
The evaluate_model.py
file can be invoked as follows:
python evaluate_model.py <args>
where args
are defined in utils.py
, and can be one of the following (default values in bold
) :
dataset_dir
:raw_images/
- path to the folder with the dataset
vgg_dir
:vgg_pretrained/imagenet-vgg-verydeep-19.mat
- path to the pre-trained VGG-19 network
dslr_dir
:fujifilm/
- path to the folder with the RGB data
phone_dir
:mediatek_raw/
- path to the folder with the Raw data
model_dir
:models/
- path to the folder with the model to be restored or saved
restore_iter
:None
- iteration to restore, defaults to last iteration
img_w
:256
- width of the evaluation patches
img_h
:256
- height of the evaluation patches
use_gpu
:True
- use the GPU for inference
batch_size
:10
- batch size
- This repo bases on the mai21-learned-smartphone-isp repository;
- The LPIPS loss function was implemented using the code from alexlee-gk;
- The RAdam optimizer was implemented using the code from taki0112.