Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About #3

Open
Cc-Hy opened this issue Mar 3, 2022 · 41 comments
Open

About #3

Cc-Hy opened this issue Mar 3, 2022 · 41 comments

Comments

@Cc-Hy
Copy link

Cc-Hy commented Mar 3, 2022

Hello, I tried to train the model, but after 120 epochs, the performance is a lot worse than yours.
The modification is that I used a larger learning rate 0.001 compare to your original 0.000225.
So first I want to ask why the learning rate you choose is so small ( generally contact with the network learning rate around 0.003 to 0.001), do you use pre-training and this is a fine-tuning?
And I want to ask for some idea about the results I got, I think the learning rate would not result in such a large gap.
And I will use your original learning rate to retrain later.
Thanks a lot.

@rockywind
Copy link

rockywind commented Mar 3, 2022

@Cc-Hy @Xianpeng919
Hi, I trained the model on training set and tested on the validation. The moderate class 3D AP is 17.57, but the paper say it was 19.03.
4b5afd05bed72801c3c19e8bf00661a

@Xianpeng919
Copy link
Owner

@rockywind Did you use the provided config to train your model?

@rockywind
Copy link

@rockywind
Copy link

rockywind commented Mar 4, 2022

@Xianpeng919
I trained the model the second time. The result is below.

3D APR40: 23.7064, 17.7595, 14.9525

@Xianpeng919
Copy link
Owner

@rockywind I'll double check and get back to you asap.

@Cc-Hy
Copy link
Author

Cc-Hy commented Mar 5, 2022

@Xianpeng919
Hello, how to modify the cfg file if I want to train with the trainval set and get test results?

@rockywind
Copy link

@Xianpeng919
I load the pretrained model and train the model. The result is below.

3D APR40: 24.2891, 18.0508, 15.2171

@kaixinbear
Copy link

Hello, I train the model with command CUDA_VISIBLE_DEVICES=0 python ./tools/train.py configs/monocon/monocon_dla34_200e_kitti.py
without any modification but the performance is rather lower. The permance get its peak at 120 epoch and get lower and lower util 0.
The result at 120 epoch:

Car [email protected], 0.70, 0.70:
3d   AP:16.5400, 12.2644, 10.5623

The result at 200 epoch:

Car [email protected], 0.70, 0.70:
3d   AP:0.0000, 0.0000, 0.0000

trainging log could be seen here
What I should do to get a rather normal result ? @Xianpeng919

@Xianpeng919
Copy link
Owner

@rockywind We have tested our released checkpoints in multiple GPUs. The result is 26.33 | 19.03 | 16.00, same as the result in the readme. Not sure what the problem is here. You might provide me with your log so that I can help you check the details.

@Xianpeng919
Copy link
Owner

@Cc-Hy You may replace the training split with the trainval split in the config

@Xianpeng919
Copy link
Owner

@kaixinbear Your dimension branch exploded during training. We did observe this during our experiments. The dimension-aware loss is a little bit unstable. You can restart your training from the un-exploded ckpts.

@kaixinbear
Copy link

Thanks for your kindly reply! I will try later

@ganyz
Copy link

ganyz commented Mar 7, 2022

@kaixinbear Your dimension branch exploded during training. We did observe this during our experiments. The dimension-aware loss is a little bit unstable. You can restart your training from the un-exploded ckpts.

Hello author,I resume my training from the un-exploded ckpts, but it still explodes in the follwing epochs. Have you met this phenomenon? Should i turn down my lr ?
Thanks!

@rockywind
Copy link

@Xianpeng919
I tested the released checkpoint. The result is the same as the readme. When I retrained the model, the result was lower than the readme.
20220302_134704.log

@excitohe
Copy link

excitohe commented Mar 7, 2022

@Xianpeng919 I tested the released checkpoint. The result is the same as the readme. When I retrained the model, the result was lower than the readme. 20220302_134704.log

Hi, have your tried multi-gpu training or are you still use single gpu training? I retrained with 4-gpu and get lower results than the readme.
https://paste.ubuntu.com/p/CtJH9Hk52F/

@Xianpeng919
Copy link
Owner

@ganyz You can restart the training from scratch.

@Xianpeng919
Copy link
Owner

@rockywind I double checked your log, the config looks good to me. I'll double check the code. You can also try another random seed to train again to see the performance.

@rockywind
Copy link

@Xianpeng919
OK, thanks a lot!

@Cc-Hy
Copy link
Author

Cc-Hy commented Mar 10, 2022

@rockywind @ganyz @kaixinbear @Xianpeng919
I find that during the training, there will be several epochs whose performance is extremely low(close to 0), and the performance from the last epoch may differ by more than 10 points.
Did you meet this situation?

@Cc-Hy
Copy link
Author

Cc-Hy commented Mar 10, 2022

Epoch 112
image
Epoch 115
image

@Cc-Hy
Copy link
Author

Cc-Hy commented Mar 10, 2022

Tried another time, and the best performance is as follow:
image

@djp1235a
Copy link

I conduct 3 experiments with different seeds, and the best performance is 17.80. Besides, results are not reproducible with the same seed and deterministic==True in the codebase.

@excitohe
Copy link

I retrained twice and got 16.20 on the GTX1080Ti and 16.80 on the Titan V. It seems that no one in the issue can retrain more than 18.00, makes me frustrated.... =_=!

@excitohe
Copy link

@rockywind @ganyz @kaixinbear @Xianpeng919 I find that during the training, there will be several epochs whose performance is extremely low(close to 0), and the performance from the last epoch may differ by more than 10 points. Did you meet this situation?

It's normal. Mono3D performance is always unstable. Just pay attention to the last few checkpoint eval results. 0.0

@djp1235a
Copy link

@excitohe I konw that the Mono3D performance is always unstable. But results are reproducible with the same seed and deterministic==True in the Monodle codebase. I'm just wondering why nondeterministic algorithms appear when using mmdet reimplementation.

@excitohe
Copy link

@Xianpeng919
Copy link
Owner

@excitohe @djp1235a @Cc-Hy I'm re-training the model based on the released code using different GPUs. I'll share with you the log in this thread once the result is out.

@Xianpeng919
Copy link
Owner

@Cc-Hy You can refer to mmdet3d's visualization scripts. Their scripts are very helpful.

@Cc-Hy
Copy link
Author

Cc-Hy commented Mar 12, 2022

@Xianpeng919
Hello, I tried to add "--show" arg in test.py, and I tried to directly use the mono_det_demo.py.
But both of them can not work properly.
Can you tell me which script do you use? And do I need to do some modifications?

@Xianpeng919
Copy link
Owner

@Cc-Hy You can do inference you model first and revise the show_results function in the mmdet3d.core.visualizer

@excitohe
Copy link

@Xianpeng919 Have you finished your retraining results yet? Looking forward to your train log file. ^_^

@Xianpeng919
Copy link
Owner

Xianpeng919 commented Mar 16, 2022

@excitohe Hi, I was travelling last weekend. Please check this log for more details. I also attach the ckpt here in case you need it. Please run *_car.py config for inference.

@Cc-Hy
Copy link
Author

Cc-Hy commented Mar 17, 2022

Tried again:
image

@excitohe
Copy link

Hi, I migrate monocon into latest mmdet3d in plugin_dir manner, and try again with only_car with your latest updated config in 4GPU.

Car [email protected], 0.70, 0.70:
bbox AP40:96.3800, 90.3432, 80.7128
bev  AP40:29.0449, 22.2251, 19.4256
3d   AP40:21.4625, 16.1725, 14.3990
aos  AP40:95.73, 89.51, 79.49

Attach the training log:
https://paste.ubuntu.com/p/HyryFkZspc/

Can you see where is the problem?Thank you so much and keep in touch. ^_^

I will reconfigure your original environment and test again with single GPU...

@excitohe
Copy link

@Cc-Hy Hi, Is this your recent result in only_car config? It looks like we're about the same...

@Cc-Hy
Copy link
Author

Cc-Hy commented Mar 17, 2022

@Cc-Hy Hi, Is this your recent result in only_car config? It looks like we're about the same...

No, these are 3 class results.
I'm training with Car only now.

@Cc-Hy
Copy link
Author

Cc-Hy commented Mar 18, 2022

Car only
image

@kaixinbear
Copy link

@Cc-Hy @Xianpeng919 @ganyz

Could you please tell me how to solve this model collapse problem? By turn down lr or change random seed?
I have tried many times but the AP drops to 0 at about 120 epoch.

@Cc-Hy
Copy link
Author

Cc-Hy commented Apr 9, 2022

If you always meet this problem, you can modify the dimension loss with L1 loss only, L = |gt - pred|.
And then the dimension loss will never explode.
@kaixinbear

@gervaisi
Copy link

@Xianpeng919
I want to use the model with mono_det_demo.py but it asks me an annotation file, where can i find it ? I precise that i've already trained the model

@FlyingAnt2018
Copy link

@Cc-Hy @Xianpeng919 Hi, I trained the model on training set and tested on the validation. The moderate class 3D AP is 17.57, but the paper say it was 19.03. 4b5afd05bed72801c3c19e8bf00661a

Hi, i got AP 19.0217 by setting "cfg.SEED = 1903919922 "

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants