This project was carried out by YAI 10th, in cooperation with Alchera
👑 KIM MINSU, YAI 7th
🚀 KIM HYUNJIN, YAI 8th
🎓 PARK JUNYOUNG, YAI 9th
🌵 LEE SUMIN, YAI 9th
🐯 JIN HYUNBIN, YAI 9th
😀 CHOI SUNGBEOM, YAI 9th
conda create -n "environment name" python=3.7
conda activate "environment name"
pip install -U Cython cmake numpy
pip install onnxruntime-gpu
pip install -U insight-face
pip install torch>=1.8.1
pip install torchvision>=0.9.1
pip install pytorch-lightning
pip install numpy
pip install scipy
pip install opencv-python
conda install scikit-image
pip install tqdm
git clone https://github.com/minsu1206/BlurFaceDetection.git
You can just clone this repo into your own computer
And finally the directory hierarchy is configured as,
FaceBlurring
├── config
│ ├── resnet18_regression.yaml
│ └── .....
├── data
├── data_samples
├── dataset
│ ├── blur.py
│ ├── create_blurring.py
│ ├── dataset.py
│ ├── utils.py
│ └── .....
├── experiments
│ ├── results
│ ├── sample_code
│ └── .....
├── legacy
├── models
│ ├── utils # dir for yolov5n.py
│ ├── edgenext.py
│ ├── mobilenetv2.py
│ └── .....
├── loss.py
├── model_factory.py
├── recorder.py
├── sample.sh
├── test.py
├── train.py
└── utils.py
- FFHQ
```
cd /data
wget https://raw.githubusercontent.com/NVlabs/ffhq-dataset/master/download_ffhq.py
python ./download_ffhq.py --images
cd ../
```
-
Our processed data (resolution : 112px)
You can download the blurred images we created from the link below.
I made two methods to create blur images
- DeblurGAN
- Defocus and Motion Blur Detection with Deep Contextual Features
You have two options to create blur images. The first option is to apply blur iteratively to an clean image. Second option is to apply blur method only once. As blur label, we use 1-cosine similarity.
I show an example command to create blurred images and save them with label information.
cd ./dataset
python create_blurimg_iterative.py --path ../data/FFHQ_1024/clean --n 4
python create_blur_label.py --path ../data/FFHQ_1024/clean
Above command would generate set of blurred images which were applied blur method four times iterative.
cd ./dataset
python create_blurimg_iterative.py --path ../data/FFHQ_1024/clean --n 4
python create_blur_label.py --path ../data/FFHQ_1024/clean --wo --multi
Above command would generate set of blurred images using multiprocess to generate faster.
cd ./dataset
python create_blur_image.py --blur defocus --iter 1
It’s how to generate blurred images with Defocus method. One blur image is generated for one clean image.
cd ./dataset
python create_blur_image.py --blur deblurgan --iter 1
This command would use DeblurGAN blur method to generate blur images.
cd ./dataset
python create_blur_image.py --blur defocus --iter 1 --scrfd True
This command would generate blur images using defocus blur method and SCRFD inference. SCRFD module is used to detect face in an image.
All generated blur images are stored in the “data” folder.
data
├── FFHQ_1024
│ ├── blur_deblurGAN
│ │ ├── 00000
│ │ │ ├── 00000.png
│ │ │ ├── 00001.png
│ │ │ ├── .....
│ │ │ └── 00999.png
│ │ ├── 01000
│ │ │ ├── 01000.png
│ │ │ ├── 01001.png
│ │ │ ├── .....
│ │ │ └── 01999.png
│ │ └── .....
│ ├── blur_defocus
│ │ ├── 00000
│ │ │ ├── 00000.png
│ │ │ ├── .....
│ │ │ └── 00999.png
│ │ └── .....
│ └── blur_Random
│ ├── 00000
│ │ ├── 00000.png
│ │ ├── .....
│ │ └── 00999.png
│ └── .....
├── label_deblurGAN
│ └── label
│ └── data_label.csv
├── label_defocus
│ └── label
│ └── data_label.csv
├──label_random
│ └── label
│ └── data_label.csv
└──label_val.csv
The following code is used to plot the distribution of the generated blur images.(Below is an example using the deblurgan method)
python data_distribution.py --path ../data/label_deblurGAN/label/data_label.csv
The distribution of the data we provided is as follows. (The x-axis is the blur label, and the y-axis is the number of images. The graph is sequentially using DeblurGAN method, Defocus method, and both methods.)
DeblurGAN | Defocus | Both |
---|---|---|
About 210,000 image samples were generated with random kernel-based methods according to the DeblurGAN and Defocus methods. And we extracted 100,000 samples among them, so that the overall dataset samples were evenly distributed. Training and validation dataset were matched through random split applied with the same random seed in each experiment. The training/validation dataset distribution is as follows.
Basically, we provide a models such as resnet, and also provide light weight backbones which show a fast interference speed.
Model | Model Size (.pt file) |
Inference speed : Average |
Config | Pre-trained Weight |
---|---|---|---|---|
ResNet18 | 42.916 MB | 143.502 (ms) | resnet18_regression.yaml | https://drive.google.com/file/d/17o8oqL-ZKcR87vIEDXwcvAIiqxrZZe2y/view?usp=sharing |
ResNet34 | 81.542 MB | 263.5752 (ms) | - | - |
EdgeNext_xx_small | 4.49 MB | 155.0043 (ms) | edgenext_regression.yaml | https://drive.google.com/file/d/1Mo2wIPXJuj0pYFPyMDtC2C39bdH1VBxm/view?usp=sharing |
YOLOv5n (custom backbone : x) | 4.106 MB | 132.2865 (ms) | yolov5n_regression.yaml | https://drive.google.com/file/d/1I-HfI5p_UC1Y39ipAjLgV1Gw9Sdqh594/view?usp=sharing |
YOLOv5n (custom backbone : xx) | 2.213 MB | 129.8896 (ms) | yolov5n_regression.yaml | - |
MobileNetV2_0.25 | 1.068 MB | 111.6102 (ms) | mobilenetv2_regression.yaml | https://drive.google.com/file/d/1Nqb1mqy512Tpj2L-pQMmDVP4GC9h6VGP/view?usp=sharing |
MobileNetV2_0.5 | 2.815 MB | 123.4103 (ms) | mobilenetv2_regression.yaml | https://drive.google.com/file/d/1St2n0FX11_R9VrH032xACKXHoXOk3ibf/view?usp=sharing |
EfficientNetLite0 | 13.137 MB | 185.1595 (ms) | - | - |
SqueezeNetV1.1 | 2.785 MB | 57.3412 (ms) | squeezenet_regression.yaml | https://drive.google.com/file/d/1IjV-7Rj56jtiJ0o15rX1xfTzc2cdo7zm/view?usp=sharing |
If you want to train the code, please refer to the training script below.
> python train.py --config config/{}.yaml --save {} --device {} --viz
optional arguments:
--config select yaml file to run (in config folder)
--save set a path to save checkpoint and graph
--device select a device (ex cuda:@)
--viz add if you want to visualize
EX)
> python train.py --config mobilenetv2_0.5_regression --save checkpoint/mobilenetv2_0.5 --device cuda:0 --viz
This figure shows that our designed model predicts motion blur well and their error is close to zero when compared to GT whether the blur angle is fixed or not. (Also whatever the backbone is!) Each model’s result is the mean of result about 30 people.
This figure shows that ResNet with simple structure predicts better than one with complex structure. Furthermore, the stack of linear layers increases the inference speed and model size. Therefore, we don’t fix any regressor (fc layer) of all the models we used at this project.
(Upper) : ResNet trained by classfication
(Bottom) : EdgeNext_xx_samll trained by classification
We divide 0 ~ 1 into
We train ResNet and EdgeNext_xx_small with cross entropy + MSE(CMSE) or crossentropy + probability based MSE (WeightMSE, WMSE). These figures show that solving this task as classification is also valid approach.
Qualitative model evaluation on video test samples. (a) is ResNet18, (b) is EdgeNext, (c) is Yolov5n, (d) is SqueezeNetV1.1, (e) is MobileNetv2(0.5) and (f) is MobileNetv2(0.25), respectively. Results of resizing the detected face image by applying the detection model, SCRFD, and then use it as an input to each model.
You can detect the blur of the video or image with trained model. First Download a video/image file to the "data" folder path that you want to detect a blur.
Below command detects a blur in the video and generates a result video.
python demo.py --device cpu --pretrained_path {pretrained_model.pt} --mode video --file_path ./data/sample.mp4 --save_path ./data/result_sample.mp4
Below command detects a blur in the image and generates a result image.
python demo.py --device cpu --pretrained_path {pretrained_model.pt} --mode image --file_path ./data/sample.png --save_path ./data/result_sample.png
Be careful you have to match following image specification(for facial cropped image) in test/inference with pre-trained model.
- model.eval() and torch.no_grad()
- Image spatial dimension should be
$112 \times 112$ - Image value should be normalized to
$0 \sim 1$ , not$0 \sim 255$