Bad results(Not bad now) #12

jmtatsch · 2017-08-25T08:50:15Z

Although the converted weights produce plausible predictions,
they are not yet up to the published results of the PSPNet paper.

Current results on cityscapes validation set:

classes          IoU      nIoU
--------------------------------
road          : 0.969      nan
sidewalk      : 0.776      nan
building      : 0.871      nan
wall          : 0.532      nan
fence         : 0.464      nan
pole          : 0.302      nan
traffic light : 0.375      nan
traffic sign  : 0.567      nan
vegetation    : 0.872      nan
terrain       : 0.591      nan
sky           : 0.905      nan
person        : 0.585    0.352
rider         : 0.253    0.147
car           : 0.897    0.698
truck         : 0.606    0.284
bus           : 0.721    0.375
train         : 0.652    0.388
motorcycle    : 0.344    0.147
bicycle       : 0.618    0.348
--------------------------------
Score Average : 0.626    0.342
--------------------------------


categories       IoU      nIoU
--------------------------------
flat          : 0.974      nan
nature        : 0.876      nan
object        : 0.397      nan
sky           : 0.905      nan
construction  : 0.872      nan
human         : 0.603    0.376
vehicle       : 0.879    0.676
--------------------------------
Score Average : 0.787    0.526
--------------------------------

Accuracy of the published code on several validation/testing sets according to the author:

PSPNet50 on ADE20K valset (mIoU/pAcc): 41.68/80.04 
PSPNet101 on VOC2012 testset (mIoU): 85.41 (multiscale evaluation!)
PSPNet101 on cityscapes valset (mIoU/pAcc): 79.70/96.38

So we are still missing 79.70 - 62.60 = 17.10 % IoU

Does anyone have an idea where we lose that accuracy?

The text was updated successfully, but these errors were encountered:

jmtatsch · 2017-08-25T09:25:23Z

Is scipy's misc.imresize(img, self.input_shape) really exactly the same as matlabs imresize(img,[new_rows new_cols],'bilinear');
How about the tf.image.resize vs the caffe Interp layer?

jefequien · 2017-08-25T09:55:01Z

You could save the intermediate activations from Keras and compare them with the Caffe. There are a few lines for this in utils.py

I forget but they may have used a sliding window for evaluation as opposed to our rescaling which distorts the aspect ratio.

jmtatsch · 2017-08-28T13:47:22Z

Looking into this further, there are minor differences between the python and Matlab image resizing
depending on input data type.
See e.g. https://stackoverflow.com/questions/26812289/matlab-vs-c-vs-opencv-imresize
Matlab performs anti-aliasing by default and also on Zhao's original code.

I did some experiments and even for a toy example nothing equals the matlab default matlab_imresize_AA

original:  [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
scipy_imresize:  [[ 4  5]
 [10 11]]
scipy_imresize_float:  [[  3.57142878   5.14285707]
 [  9.8571434   11.4285717 ]]
scipy_zoom: [[ 0  3]
 [12 15]]
OpenCv:  [[ 3  5]
 [11 13]]
scikit_resize:  [[ 3  5]
 [11 13]]
matlab_imresize_AA:  [[4,5],[11,12]]
matlab_imresize:  [[3,5],[11,13]]

@hujh14 you mentioned somewhere that you already checked the activations. Do yo remember up to which point?

jmtatsch · 2017-08-29T09:37:56Z

@hujh14 Unfortunately, i cannot run the original code on a 8GB 1080 GTX, not even with batch size 1 due to insufficient memory. Did you manage to compile the original code with CuDNN support? or do you have 12GB cards?

Vladkryvoruchko · 2017-08-29T10:11:28Z

@jmtatsch I have card same as yours, and its works well, takes about 3.5 gb of memory

jmtatsch · 2017-08-29T12:17:26Z

@Vladkryvoruchko are you sure it works with the cityscapes model? it is much larger than both others... which CUDA, CuDNN?

jmtatsch · 2017-08-29T13:49:18Z

Doing a sliced evaluation now, much better detail!

Will do a smarter overlap treatment then evaluate again

Vladkryvoruchko · 2017-08-29T17:53:24Z

@jmtatsch . Oh... I wrote about ADE :)

jmtatsch · 2017-08-31T09:27:41Z

#13

jmtatsch · 2017-09-01T14:06:47Z

Without flipped evaluation

classes          IoU      nIoU
--------------------------------
road          : 0.981      nan
sidewalk      : 0.849      nan
building      : 0.922      nan
wall          : 0.572      nan
fence         : 0.624      nan
pole          : 0.589      nan
traffic light : 0.690      nan
traffic sign  : 0.777      nan
vegetation    : 0.919      nan
terrain       : 0.641      nan
sky           : 0.943      nan
person        : 0.808    0.627
rider         : 0.617    0.472
car           : 0.949    0.857
truck         : 0.766    0.477
bus           : 0.859    0.639
train         : 0.794    0.569
motorcycle    : 0.653    0.456
bicycle       : 0.768    0.578
--------------------------------
Score Average : 0.775    0.584
--------------------------------


categories       IoU      nIoU
--------------------------------
flat          : 0.986      nan
nature        : 0.921      nan
object        : 0.670      nan
sky           : 0.943      nan
construction  : 0.925      nan
human         : 0.822    0.656
vehicle       : 0.936    0.834
--------------------------------
Score Average : 0.886    0.745
--------------------------------

Just 2.2 % missing :)

jmtatsch · 2017-09-04T10:01:14Z

Okay, with flipped evaluation:

classes          IoU      nIoU
--------------------------------
road          : 0.982      nan
sidewalk      : 0.853      nan
building      : 0.924      nan
wall          : 0.582      nan
fence         : 0.634      nan
pole          : 0.593      nan
traffic light : 0.692      nan
traffic sign  : 0.780      nan
vegetation    : 0.920      nan
terrain       : 0.644      nan
sky           : 0.945      nan
person        : 0.811    0.630
rider         : 0.618    0.478
car           : 0.951    0.860
truck         : 0.795    0.480
bus           : 0.870    0.638
train         : 0.799    0.591
motorcycle    : 0.656    0.457
bicycle       : 0.771    0.580
--------------------------------
Score Average : 0.780    0.589
--------------------------------


categories       IoU      nIoU
--------------------------------
flat          : 0.986      nan
nature        : 0.922      nan
object        : 0.674      nan
sky           : 0.945      nan
construction  : 0.927      nan
human         : 0.825    0.659
vehicle       : 0.937    0.836
--------------------------------
Score Average : 0.888    0.747
--------------------------------

Still 1,7% missing. Will look into multi-scale evaluation.

jmtatsch · 2017-09-05T08:26:41Z

Adding multi-scale on top actually worsened the results by far. A funky gif with scales = [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] last frame is the aggregated one.

leinxx · 2017-09-11T11:51:22Z

@ jmtatsch, would you mind explain what do you mean by sliced prediction? I am working on the same problem:(

jmtatsch · 2017-09-11T13:19:41Z

@leinxx by sliced prediction I mean cutting the image into 4x2 overlapping 713x713 slices, forwarding them though the network and reassembling the 2048x1024 predictions from them.

Please let us know if fix further issues and get closer to the published results...

wtliao · 2017-09-25T08:23:33Z

@jmtatsch thanks a lot. I have learned much from your replication. Would you explain why sliced prediction improve the result so much? and what is flipped evaluation (do you mean flip over the image during training for data augumentation)? Could you provide more details about your training setting? e.g. batch_size, epoch, etc.

jmtatsch · 2017-09-25T08:38:35Z

@wtliao Unfortunately, I did not (yet) train these weights myself.
Sliced/sliding prediction has so much more details because the weights were trained on 714x714 crops from the full resolution image and the 714x714 crops are used for prediction instead of a downsampled 512x256 image that is then upsampled to full resolution.
Flipped evaluation means predicting on the image and a vertically flipped image at the same time and averaging the results.

jkschin · 2017-10-06T07:51:55Z

@jmtatsch Thanks for these posts! They were really helpful! 👍

scenarios · 2017-12-18T06:00:15Z

Hi @jmtatsch, I found in your code the kernel size and stride size of the pyramid pooling module are set by (10xlevel, 10xlevel). It is the right size for VOC and ade20k with input size (473, 473). However, when using input size (713, 713) as in cityspace case, the size obtained by (10xlevel, 10xlevel) is not identical to the original code.
I believe it is the main reason for the last 1.7% performance drop ^^

jmtatsch · 2017-12-18T13:10:16Z

@scenarios Good catch, I will fix this in #30 and reevaluate.

TheRevanchist · 2018-02-02T16:31:10Z

Hi,

What are the results do you get after the last changes? Do you get similar results to the paper, or still they are worse?

shipengai · 2018-02-24T09:54:34Z

@jmtatsch.Did you train PSPnet on any dataset?

jmtatsch · 2018-02-24T11:05:24Z

No, But some train code was merged recently, so someone should have.

…

Am 24.02.2018 um 10:54 schrieb shipeng ***@***.***>: @jmtatsch.Did you train PSPnet on any dataset? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

YuShen1116 · 2018-10-30T00:59:38Z

Hi.

Thanks for your excellent work, could you please tell me how do you get the evaluation results? Are you using the scripts in the cityscapes repo?

THank you!

ktnguyen2 · 2019-04-03T02:33:28Z

Hello,

I would also like to know how to get the evaluation results. I tried using the cityscapes scripts, but I am getting this error when using seg_read images as my input:

Traceback (most recent call last):
File "evalPixelLevelSemanticLabeling.py", line 696, in
main()
File "evalPixelLevelSemanticLabeling.py", line 690, in main
evaluateImgLists(predictionImgList, groundTruthImgList, args)
File "evalPixelLevelSemanticLabeling.py", line 478, in evaluateImgLists
nbPixels += evaluatePair(predictionImgFileName, groundTruthImgFileName, confMatrix, instStats, perImageStats, args)
File "evalPixelLevelSemanticLabeling.py", line 605, in evaluatePair
confMatrix[gt_id][pred_id] += c
IndexError: index 34 is out of bounds for axis 0 with size 34

DiegoOrtego mentioned this issue Oct 15, 2017

PSPnet performance cityscapes zijundeng/pytorch-semantic-segmentation#6

Open

kazuto1011 mentioned this issue Jan 25, 2018

accuracy is much lower kazuto1011/pspnet-pytorch#2

Open

Vladkryvoruchko changed the title ~~Bad results~~ Bad results(Not bad now) May 17, 2018

erichhhhho mentioned this issue Jul 1, 2019

Update PSPNet and ICNet meetps/pytorch-semseg#81

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad results(Not bad now) #12

Bad results(Not bad now) #12

jmtatsch commented Aug 25, 2017

jmtatsch commented Aug 25, 2017

jefequien commented Aug 25, 2017

jmtatsch commented Aug 28, 2017 •

edited

Loading

jmtatsch commented Aug 29, 2017

Vladkryvoruchko commented Aug 29, 2017

jmtatsch commented Aug 29, 2017

jmtatsch commented Aug 29, 2017 •

edited

Loading

Vladkryvoruchko commented Aug 29, 2017

jmtatsch commented Aug 31, 2017

jmtatsch commented Sep 1, 2017

jmtatsch commented Sep 4, 2017

jmtatsch commented Sep 5, 2017

leinxx commented Sep 11, 2017

jmtatsch commented Sep 11, 2017 •

edited

Loading

wtliao commented Sep 25, 2017

jmtatsch commented Sep 25, 2017 •

edited

Loading

jkschin commented Oct 6, 2017

scenarios commented Dec 18, 2017 •

edited

Loading

jmtatsch commented Dec 18, 2017 •

edited

Loading

TheRevanchist commented Feb 2, 2018

shipengai commented Feb 24, 2018

jmtatsch commented Feb 24, 2018 via email

YuShen1116 commented Oct 30, 2018

ktnguyen2 commented Apr 3, 2019

Bad results(Not bad now) #12

Bad results(Not bad now) #12

Comments

jmtatsch commented Aug 25, 2017

jmtatsch commented Aug 25, 2017

jefequien commented Aug 25, 2017

jmtatsch commented Aug 28, 2017 • edited Loading

jmtatsch commented Aug 29, 2017

Vladkryvoruchko commented Aug 29, 2017

jmtatsch commented Aug 29, 2017

jmtatsch commented Aug 29, 2017 • edited Loading

Vladkryvoruchko commented Aug 29, 2017

jmtatsch commented Aug 31, 2017

jmtatsch commented Sep 1, 2017

jmtatsch commented Sep 4, 2017

jmtatsch commented Sep 5, 2017

leinxx commented Sep 11, 2017

jmtatsch commented Sep 11, 2017 • edited Loading

wtliao commented Sep 25, 2017

jmtatsch commented Sep 25, 2017 • edited Loading

jkschin commented Oct 6, 2017

scenarios commented Dec 18, 2017 • edited Loading

jmtatsch commented Dec 18, 2017 • edited Loading

TheRevanchist commented Feb 2, 2018

shipengai commented Feb 24, 2018

jmtatsch commented Feb 24, 2018 via email

YuShen1116 commented Oct 30, 2018

ktnguyen2 commented Apr 3, 2019

jmtatsch commented Aug 28, 2017 •

edited

Loading

jmtatsch commented Aug 29, 2017 •

edited

Loading

jmtatsch commented Sep 11, 2017 •

edited

Loading

jmtatsch commented Sep 25, 2017 •

edited

Loading

scenarios commented Dec 18, 2017 •

edited

Loading

jmtatsch commented Dec 18, 2017 •

edited

Loading