Unofficial Reproduce of PSPNet.

External links: Pyramid Scene Parsing Network paper and official github.

Here is an implementation of PSPNet (from training to test) in pure Tensorflow library (tested on TF1.1, TF1.4 and TF1.10).

Supported Backbones: ResNet-V1-50, ResNet-V1-101 and other ResNet-V1s can be easily added.
Supported Databases: ADE20K, SBD (Augmented Pascal VOC) and Cityscapes.
Supported Modes: training, validation and inference with multi-scale inputs.
More things: L2-SP regularization and sync batch normalization implementation.

L2-SP Regularization

L2-SP regularization is a variant of L2 regularization. Instead of the origin like L2 does, L2-SP sets the pre-trained model as reference, just like (w - w0)^2, where w0 is the pre-trained model. Simple but effective. More details about L2-SP can be found in the paper and the code.

Sync Batch Norm

When concerning image segmentation, batch size is usually limited. Small batch size will make the gradients instable and harm the performance, especially for batch normalization layers. Multi-GPU settings by default does not help because the statistics in batch normalization layer are computed independently within each GPU. More discussion can be found here and here.

This repo resolves this problem in pure python and pure Tensorflow by simply using a list as input. The main idea is located in model/utils_mg.py

I do not know if this is the first implementation of sync batch norm in Tensorflow, but there is already an implementation in PyTorch and some applications.

Update: There is other implementation that uses NCCL to gather statistics across GPUs, see in tensorpack. However, TF1.1 does not support gradients passing by nccl_all_reduce. Plus, ppc64le with tf1.10, cuda9.0 and nccl1.3.5 was not able to run this code. No idea why, and do not want to spend a lot of time on this. Maybe nccl2 can solve this.

Results

Numerical Results

Random scaling for all
Random rotation for SBD
SS/MS on validation set
Welcome to correct and fill in the table

	Backbones	L2	L2-SP
Cityscapes (train set: 3K)	ResNet-50	76.9/?	77.9/?
Cityscapes (train set: 3K)	ResNet-101	77.9/?	78.6/?
Cityscapes (coarse + train set: 20K + 3K)	ResNet-50
Cityscapes (coarse + train set: 20K + 3K)	ResNet-101	80.0/80.9	80.1/81.2*
SBD	ResNet-50	76.5/?	76.6/?
SBD	ResNet-101	77.5/79.2	78.5/79.9
ADE20K	ResNet-50	41.81/?
ADE20K	ResNet-101

*This model gets 80.3 without post-processing methods on Cityscapes test set (1525).

Qualitative Results on Cityscapes

Devil Details

Scripts

Prepare the databases with the links: ADE20K, SBD (Augmented Pascal VOC) and Cityscapes.
Download pretrained models.
a. A script of training resnet-50 on Cityscapes, getting around 77.9 mIoU scores:

python ./train.py --batch_size 4 --gpu_num 4 --consider_dilated 1 --weight_decay_rate 0.0001 --weight_decay_rate2 0.001 --random_rotate 0 --database 'Cityscapes' --train_image_size 816 --test_image_size 816

b. A script of training resnet-50 on ADE20K, getting around 41.8 mIoU scores:

python ./train.py --batch_size 8 --gpu_num 2 --weight_decay_mode 0 --weight_decay_rate 0.0001 --weight_decay_rate2 0.0001 --train_max_iter 60000 --snapshot 30000 --random_rotate 0 --database 'ADE' --train_image_size 480 --test_image_size 480

Test with multi-scale and save predictions (labels) and coloring prediction images (only Cityscapes now but easy for other databases).

CUDA_VISIBLE_DEVICES=0 python predict.py --coloring 1 --ms 1 --network resnet_v1_50 --weights_ckpt ./log/resnet_v1_50-816-train-L2-SP-wd_alpha0.0001-wd_beta0.001-batch_size16-lrn_rate0.01-consider_dilated1-random_rotate0-random_scale1/model.ckpt-30000

Infer one image (with or without multi-scale).

CUDA_VISIBLE_DEVICES=$n python inference.py --coloring 1 --ms 0 --network resnet_v1_50 --weights_ckpt ./log/resnet_v1_50-816-train-L2-SP-wd_alpha0.0001-wd_beta0.001-batch_size16-lrn_rate0.01-consider_dilated1-random_rotate0-random_scale1/model.ckpt-30000 --image_path ./database/cityscapes/leftImg8bit/test/berlin/berlin_000000_000019_leftImg8bit.png

Infer many images one by one.

CUDA_VISIBLE_DEVICES=$n python inference.py --coloring 1 --ms 0 --network resnet_v1_50 --weights_ckpt ./log/resnet_v1_50-816-train-L2-SP-wd_alpha0.0001-wd_beta0.001-batch_size16-lrn_rate0.01-consider_dilated1-random_rotate0-random_scale1/model.ckpt-30000

then give the image path.

Uncertainties for Training Details:

(Cityscapes only) Whether finely labeled data in the first training stage should be involved?
(Cityscapes only) Whether the (base) learning rate should be reduced in the second training stage?
Whether logits should be resized to original size before computing the loss?
Whether new layers should receive larger learning rate?
About weired padding behavior of tf.image.resize_images(). Whether the align_corners=True should be set?
What is optimal hyperparameter of decay for statistics of batch normalization layers? (0.9, 0.95, 0.9997)
may be more but not sure how much these little changes can effect the results ...
Welcome to discuss !

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
database		database
experiment_manager		experiment_manager
model		model
results_examples		results_examples
z_pretrained_weights		z_pretrained_weights
.gitignore		.gitignore
README.md		README.md
inference.py		inference.py
predict.py		predict.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unofficial Reproduce of PSPNet.

L2-SP Regularization

Sync Batch Norm

Results

Numerical Results

Qualitative Results on Cityscapes

Devil Details

Scripts

Uncertainties for Training Details:

About

Releases

Packages

Languages

TianbaiChen/PSPNet-TF-Reproduce

Folders and files

Latest commit

History

Repository files navigation

Unofficial Reproduce of PSPNet.

L2-SP Regularization

Sync Batch Norm

Results

Numerical Results

Qualitative Results on Cityscapes

Devil Details

Scripts

Uncertainties for Training Details:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages