Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anyway to train ? #5

Open
yijiezh opened this issue Aug 7, 2017 · 24 comments
Open

Anyway to train ? #5

yijiezh opened this issue Aug 7, 2017 · 24 comments
Milestone

Comments

@yijiezh
Copy link

yijiezh commented Aug 7, 2017

How to train with given data set?

@wcy940418
Copy link
Collaborator

@EvanZzZz Sorry, we currently have not figured out a good way to train the code. Basically, training skills are the key points and source of good performance of PSPNET. Because of some NDA reasons, the author does not share the train the training code. You can try to use OpenMPI for training the PSPNET on several GPUs to get good performance, but I am not sure the difficulty about this work.

@yijiezh
Copy link
Author

yijiezh commented Aug 7, 2017

In this repo, you convert the caffe model to tensorflow and use the trained params, and then didn't get the similar results ?

@wcy940418
Copy link
Collaborator

@EvanZzZz Yes, this may due to something different between caffe and tensorflow, but I still don't know why

@yijiezh
Copy link
Author

yijiezh commented Aug 7, 2017

How can you verify your layers_builder build the exact model as in the paper.

@yijiezh
Copy link
Author

yijiezh commented Aug 7, 2017

Since you already have the network, why can't you train the model from scratch ?

@wcy940418
Copy link
Collaborator

@EvanZzZz If the explicit parameters(e.g. weight, bias) are not identical to the model in the paper, it can not be assigned to our network, since all the parameters are ported from original work. As for internal architecture in network, we just use the common layer setting in keras. If the caffe code uses some "magical" layer and some hidden settings, as least I have no idea about how to do it.

The PSPNET depends on some magic optimization training skills heavily, and the batch size they used is impossible for single card training by normal training function. You know that bigger batch size plus batch normalization will bring very significant promotion, but it is hard to perform batch normalization on multi GPUs, at least I haven't done this before.

@yijiezh
Copy link
Author

yijiezh commented Aug 8, 2017

Intersting to know this. Is there any framework that support BN on multi-GPU currently ?

@wcy940418
Copy link
Collaborator

@EvanZzZz I only know OpenMPI can help to do it.

@mrlzla
Copy link
Contributor

mrlzla commented Aug 30, 2017

Is there some way to load resnet pre-trained on imagenet weights?
How many time it spends to train my custom model on single 8gb GPU?

@xg1990
Copy link

xg1990 commented Dec 5, 2017

So in this case, fine-tuning the model is also not possible?

@nurhadiyatna
Copy link

Hi Guys, I tried to train the keras code from the scratch, and got really good performance for cityscapes data (coarse and fine annotation). ~88% for validation accuracy for pixel level (not IoU), and ~85% of testing (not IoU). I use val data in cityscapes dataset as testing (500 images) and using 2970 data for training and validation (80% training and 20% val). However, I got some issue with image dimension. Is there any possibility to use same aspect ratio with cityscapes data?

@jmtatsch
Copy link
Contributor

@nurhadiyatna can you please post your full result for the validation set? Afaik the original paper only trains on quadratic patches and stitches their quadratic predictions

@nurhadiyatna
Copy link

aha, Ok i see then. That's why in this repo multiscale and slice prediction provided.

Sure I will try to provide some image to show the result.

Btw, While I use the layer builder I need to reshape the dimension from the last layer (None,713,713,Num_classes) to (None,508369,Num_classes), could someone explain to me why we need to do that? Because I am a newbie in this field. Thank you so much.

@jmtatsch
Copy link
Contributor

jmtatsch commented Feb 1, 2018

@nurhadiyatna any update on the results of a self-trained psp-net?

@nurhadiyatna
Copy link

nurhadiyatna commented Feb 5, 2018

@jmtatsch I got an overfitting training I guess, here some log of my training process. I use both coarse and fine annotation :

overfit

Accuracy:

over_fitting_acc

btw, I need to split the data while loading the dataset, it's like 10 splits of 2970, so this training process actually 10 times of 100 epochs. ADAM used with lr:0,001. I only have pixel accuracy, and IoU still in progress. However, compared with Segnet that used earlier, this is far outperformed it. The final testing using 500 val dataset is like ~86% of accuracy.

Here is My latest result with PSPNet using SGD: 0,001
pred

Loss :

sgd_loss1

Acc :

sgd_acc1

I still have a problem doing sliced prediction since I change a bit the last layer in PSPNet. Can you help me to figure out how to solve that @jmtatsch ?

@shipengai
Copy link

Hi, @nurhadiyatna Can you share your train code ?
Thanks .

@shipengai
Copy link

Hi, @nurhadiyatna Did you train PSPnet on Ade20K dataset ?
Thanks .

@nurhadiyatna
Copy link

Hi @shipeng-uestc ... No I didn't. Only in pascal VOC 2010 and Cityscapes, however the result still far below the original paper....

@horizonheart
Copy link

i didn't find the loss fucntion

@zl535320706
Copy link

Well, I'm confused that how to use "train.py", I check the "/pyhton_utils/preprocessing.py", and don't know the default training process uses which datasets. According to the "train.py " '86 parser.add_argument('-m', '--model', type=str, default='pspnet50_ade20k', ',I guess the default is ade20k, and I add the datasets path to "train.py".
After that, I run "train.py", and got the message "Pooling parameters for input shape (640, 480) are not defined.". I think the reason is "train.py " '97 train(args.datadir, args.logdir, (640, 480), args.classes, args.resnet_layers,', which in "layers_builder.py", '195 input_shape== (640, 480)", so it will "print("Pooling parameters for input shape ",
input_shape, " are not defined.")
exit(1)"
emmm..... how to train this code in common datasets(such as voc 2012, cityscape and ade20k)? @nurhadiyatna @jmtatsch @wcy940418

@world4jason
Copy link

HI @nurhadiyatna can you share your training code on Cityscapes dataset?
the training code in this project is quit confused....
Thanks.

@sainttelant
Copy link

@world4jason indeed, there are many errors in this training code, quite confused, i think anyone who intend to train should modify the preprocessing.py, and make sure the compatibility of python2 and python3 of train.py

@sainttelant
Copy link

i've already rewritten the train.py and some relevant *.py codes, it could run the training, however, the batch_size must be set to 1, otherwise , the resources of hardware will be collapsed

@world4jason
Copy link

@sainttelant can you kindly share your codes or project? thanks

@Vladkryvoruchko Vladkryvoruchko added this to the Q&A milestone Sep 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests