Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad results(Not bad now) #12

Open
jmtatsch opened this issue Aug 25, 2017 · 24 comments
Open

Bad results(Not bad now) #12

jmtatsch opened this issue Aug 25, 2017 · 24 comments

Comments

@jmtatsch
Copy link
Contributor

Although the converted weights produce plausible predictions,
they are not yet up to the published results of the PSPNet paper.

Current results on cityscapes validation set:

classes          IoU      nIoU
--------------------------------
road          : 0.969      nan
sidewalk      : 0.776      nan
building      : 0.871      nan
wall          : 0.532      nan
fence         : 0.464      nan
pole          : 0.302      nan
traffic light : 0.375      nan
traffic sign  : 0.567      nan
vegetation    : 0.872      nan
terrain       : 0.591      nan
sky           : 0.905      nan
person        : 0.585    0.352
rider         : 0.253    0.147
car           : 0.897    0.698
truck         : 0.606    0.284
bus           : 0.721    0.375
train         : 0.652    0.388
motorcycle    : 0.344    0.147
bicycle       : 0.618    0.348
--------------------------------
Score Average : 0.626    0.342
--------------------------------


categories       IoU      nIoU
--------------------------------
flat          : 0.974      nan
nature        : 0.876      nan
object        : 0.397      nan
sky           : 0.905      nan
construction  : 0.872      nan
human         : 0.603    0.376
vehicle       : 0.879    0.676
--------------------------------
Score Average : 0.787    0.526
--------------------------------

Accuracy of the published code on several validation/testing sets according to the author:

PSPNet50 on ADE20K valset (mIoU/pAcc): 41.68/80.04 
PSPNet101 on VOC2012 testset (mIoU): 85.41 (multiscale evaluation!)
PSPNet101 on cityscapes valset (mIoU/pAcc): 79.70/96.38

So we are still missing 79.70 - 62.60 = 17.10 % IoU

Does anyone have an idea where we lose that accuracy?

@jmtatsch
Copy link
Contributor Author

Is scipy's misc.imresize(img, self.input_shape) really exactly the same as matlabs imresize(img,[new_rows new_cols],'bilinear');
How about the tf.image.resize vs the caffe Interp layer?

@jefequien
Copy link
Collaborator

You could save the intermediate activations from Keras and compare them with the Caffe. There are a few lines for this in utils.py

I forget but they may have used a sliding window for evaluation as opposed to our rescaling which distorts the aspect ratio.

@jmtatsch
Copy link
Contributor Author

jmtatsch commented Aug 28, 2017

Looking into this further, there are minor differences between the python and Matlab image resizing
depending on input data type.
See e.g. https://stackoverflow.com/questions/26812289/matlab-vs-c-vs-opencv-imresize
Matlab performs anti-aliasing by default and also on Zhao's original code.

I did some experiments and even for a toy example nothing equals the matlab default matlab_imresize_AA

original:  [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
scipy_imresize:  [[ 4  5]
 [10 11]]
scipy_imresize_float:  [[  3.57142878   5.14285707]
 [  9.8571434   11.4285717 ]]
scipy_zoom: [[ 0  3]
 [12 15]]
OpenCv:  [[ 3  5]
 [11 13]]
scikit_resize:  [[ 3  5]
 [11 13]]
matlab_imresize_AA:  [[4,5],[11,12]]
matlab_imresize:  [[3,5],[11,13]]

@hujh14 you mentioned somewhere that you already checked the activations. Do yo remember up to which point?

@jmtatsch
Copy link
Contributor Author

@hujh14 Unfortunately, i cannot run the original code on a 8GB 1080 GTX, not even with batch size 1 due to insufficient memory. Did you manage to compile the original code with CuDNN support? or do you have 12GB cards?

@Vladkryvoruchko
Copy link
Owner

@jmtatsch I have card same as yours, and its works well, takes about 3.5 gb of memory

@jmtatsch
Copy link
Contributor Author

@Vladkryvoruchko are you sure it works with the cityscapes model? it is much larger than both others... which CUDA, CuDNN?

@jmtatsch
Copy link
Contributor Author

jmtatsch commented Aug 29, 2017

Doing a sliced evaluation now, much better detail!

cityscapes_seg_blended

Will do a smarter overlap treatment then evaluate again

@Vladkryvoruchko
Copy link
Owner

@jmtatsch . Oh... I wrote about ADE :)

@jmtatsch
Copy link
Contributor Author

#13

@jmtatsch
Copy link
Contributor Author

jmtatsch commented Sep 1, 2017

Without flipped evaluation

classes          IoU      nIoU
--------------------------------
road          : 0.981      nan
sidewalk      : 0.849      nan
building      : 0.922      nan
wall          : 0.572      nan
fence         : 0.624      nan
pole          : 0.589      nan
traffic light : 0.690      nan
traffic sign  : 0.777      nan
vegetation    : 0.919      nan
terrain       : 0.641      nan
sky           : 0.943      nan
person        : 0.808    0.627
rider         : 0.617    0.472
car           : 0.949    0.857
truck         : 0.766    0.477
bus           : 0.859    0.639
train         : 0.794    0.569
motorcycle    : 0.653    0.456
bicycle       : 0.768    0.578
--------------------------------
Score Average : 0.775    0.584
--------------------------------


categories       IoU      nIoU
--------------------------------
flat          : 0.986      nan
nature        : 0.921      nan
object        : 0.670      nan
sky           : 0.943      nan
construction  : 0.925      nan
human         : 0.822    0.656
vehicle       : 0.936    0.834
--------------------------------
Score Average : 0.886    0.745
--------------------------------

Just 2.2 % missing :)

@jmtatsch
Copy link
Contributor Author

jmtatsch commented Sep 4, 2017

Okay, with flipped evaluation:

classes          IoU      nIoU
--------------------------------
road          : 0.982      nan
sidewalk      : 0.853      nan
building      : 0.924      nan
wall          : 0.582      nan
fence         : 0.634      nan
pole          : 0.593      nan
traffic light : 0.692      nan
traffic sign  : 0.780      nan
vegetation    : 0.920      nan
terrain       : 0.644      nan
sky           : 0.945      nan
person        : 0.811    0.630
rider         : 0.618    0.478
car           : 0.951    0.860
truck         : 0.795    0.480
bus           : 0.870    0.638
train         : 0.799    0.591
motorcycle    : 0.656    0.457
bicycle       : 0.771    0.580
--------------------------------
Score Average : 0.780    0.589
--------------------------------


categories       IoU      nIoU
--------------------------------
flat          : 0.986      nan
nature        : 0.922      nan
object        : 0.674      nan
sky           : 0.945      nan
construction  : 0.927      nan
human         : 0.825    0.659
vehicle       : 0.937    0.836
--------------------------------
Score Average : 0.888    0.747
--------------------------------

Still 1,7% missing. Will look into multi-scale evaluation.

@jmtatsch
Copy link
Contributor Author

jmtatsch commented Sep 5, 2017

Adding multi-scale on top actually worsened the results by far. A funky gif with scales = [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] last frame is the aggregated one.
evaluation_of_scales_2017-09-05 10 12 21 396198

@leinxx
Copy link

leinxx commented Sep 11, 2017

@ jmtatsch, would you mind explain what do you mean by sliced prediction? I am working on the same problem:(

@jmtatsch
Copy link
Contributor Author

jmtatsch commented Sep 11, 2017

@leinxx by sliced prediction I mean cutting the image into 4x2 overlapping 713x713 slices, forwarding them though the network and reassembling the 2048x1024 predictions from them.

sliding_evaluation

Please let us know if fix further issues and get closer to the published results...

@wtliao
Copy link

wtliao commented Sep 25, 2017

@jmtatsch thanks a lot. I have learned much from your replication. Would you explain why sliced prediction improve the result so much? and what is flipped evaluation (do you mean flip over the image during training for data augumentation)? Could you provide more details about your training setting? e.g. batch_size, epoch, etc.

@jmtatsch
Copy link
Contributor Author

jmtatsch commented Sep 25, 2017

@wtliao Unfortunately, I did not (yet) train these weights myself.
Sliced/sliding prediction has so much more details because the weights were trained on 714x714 crops from the full resolution image and the 714x714 crops are used for prediction instead of a downsampled 512x256 image that is then upsampled to full resolution.
Flipped evaluation means predicting on the image and a vertically flipped image at the same time and averaging the results.

@jkschin
Copy link

jkschin commented Oct 6, 2017

@jmtatsch Thanks for these posts! They were really helpful! 👍

@scenarios
Copy link

scenarios commented Dec 18, 2017

Hi @jmtatsch, I found in your code the kernel size and stride size of the pyramid pooling module are set by (10xlevel, 10xlevel). It is the right size for VOC and ade20k with input size (473, 473). However, when using input size (713, 713) as in cityspace case, the size obtained by (10xlevel, 10xlevel) is not identical to the original code.
I believe it is the main reason for the last 1.7% performance drop ^^

@jmtatsch
Copy link
Contributor Author

jmtatsch commented Dec 18, 2017

@scenarios Good catch, I will fix this in #30 and reevaluate.

@TheRevanchist
Copy link

Hi,

What are the results do you get after the last changes? Do you get similar results to the paper, or still they are worse?

@shipengai
Copy link

@jmtatsch.Did you train PSPnet on any dataset?

@jmtatsch
Copy link
Contributor Author

jmtatsch commented Feb 24, 2018 via email

@Vladkryvoruchko Vladkryvoruchko changed the title Bad results Bad results(Not bad now) May 17, 2018
@YuShen1116
Copy link

Hi.

Thanks for your excellent work, could you please tell me how do you get the evaluation results? Are you using the scripts in the cityscapes repo?

THank you!

@ktnguyen2
Copy link

Hello,

I would also like to know how to get the evaluation results. I tried using the cityscapes scripts, but I am getting this error when using seg_read images as my input:

Traceback (most recent call last):
File "evalPixelLevelSemanticLabeling.py", line 696, in
main()
File "evalPixelLevelSemanticLabeling.py", line 690, in main
evaluateImgLists(predictionImgList, groundTruthImgList, args)
File "evalPixelLevelSemanticLabeling.py", line 478, in evaluateImgLists
nbPixels += evaluatePair(predictionImgFileName, groundTruthImgFileName, confMatrix, instStats, perImageStats, args)
File "evalPixelLevelSemanticLabeling.py", line 605, in evaluatePair
confMatrix[gt_id][pred_id] += c
IndexError: index 34 is out of bounds for axis 0 with size 34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests