About #3

Cc-Hy · 2022-03-03T03:38:45Z

Hello, I tried to train the model, but after 120 epochs, the performance is a lot worse than yours.
The modification is that I used a larger learning rate 0.001 compare to your original 0.000225.
So first I want to ask why the learning rate you choose is so small ( generally contact with the network learning rate around 0.003 to 0.001), do you use pre-training and this is a fine-tuning?
And I want to ask for some idea about the results I got, I think the learning rate would not result in such a large gap.
And I will use your original learning rate to retrain later.
Thanks a lot.

rockywind · 2022-03-03T12:23:42Z

@Cc-Hy @Xianpeng919
Hi, I trained the model on training set and tested on the validation. The moderate class 3D AP is 17.57, but the paper say it was 19.03.

Xianpeng919 · 2022-03-03T23:22:00Z

@rockywind Did you use the provided config to train your model?

rockywind · 2022-03-04T14:36:32Z

@Xianpeng919 Yes, I use the default config.
https://github.com/Xianpeng919/MonoCon/blob/main/monocon/configs/monocon/monocon_dla34_200e_kitti.py

rockywind · 2022-03-04T15:15:38Z

@Xianpeng919
I trained the model the second time. The result is below.

3D APR40: 23.7064, 17.7595, 14.9525

Xianpeng919 · 2022-03-04T19:33:31Z

@rockywind I'll double check and get back to you asap.

Cc-Hy · 2022-03-05T02:01:45Z

@Xianpeng919
Hello, how to modify the cfg file if I want to train with the trainval set and get test results?

rockywind · 2022-03-05T02:57:06Z

@Xianpeng919
I load the pretrained model and train the model. The result is below.

3D APR40: 24.2891, 18.0508, 15.2171

kaixinbear · 2022-03-05T03:44:49Z

Hello, I train the model with command CUDA_VISIBLE_DEVICES=0 python ./tools/train.py configs/monocon/monocon_dla34_200e_kitti.py
without any modification but the performance is rather lower. The permance get its peak at 120 epoch and get lower and lower util 0.
The result at 120 epoch:

Car [email protected], 0.70, 0.70:
3d   AP:16.5400, 12.2644, 10.5623

The result at 200 epoch:

Car [email protected], 0.70, 0.70:
3d   AP:0.0000, 0.0000, 0.0000

trainging log could be seen here
What I should do to get a rather normal result ? @Xianpeng919

Xianpeng919 · 2022-03-05T13:51:42Z

@rockywind We have tested our released checkpoints in multiple GPUs. The result is 26.33 | 19.03 | 16.00, same as the result in the readme. Not sure what the problem is here. You might provide me with your log so that I can help you check the details.

Xianpeng919 · 2022-03-05T13:52:57Z

@Cc-Hy You may replace the training split with the trainval split in the config

Xianpeng919 · 2022-03-05T13:54:56Z

@kaixinbear Your dimension branch exploded during training. We did observe this during our experiments. The dimension-aware loss is a little bit unstable. You can restart your training from the un-exploded ckpts.

kaixinbear · 2022-03-05T13:57:52Z

Thanks for your kindly reply! I will try later

ganyz · 2022-03-07T09:03:50Z

@kaixinbear Your dimension branch exploded during training. We did observe this during our experiments. The dimension-aware loss is a little bit unstable. You can restart your training from the un-exploded ckpts.

Hello author，I resume my training from the un-exploded ckpts, but it still explodes in the follwing epochs. Have you met this phenomenon？ Should i turn down my lr ?
Thanks!

rockywind · 2022-03-07T12:46:48Z

@Xianpeng919
I tested the released checkpoint. The result is the same as the readme. When I retrained the model, the result was lower than the readme.
20220302_134704.log

excitohe · 2022-03-07T13:41:26Z

@Xianpeng919 I tested the released checkpoint. The result is the same as the readme. When I retrained the model, the result was lower than the readme. 20220302_134704.log

Hi, have your tried multi-gpu training or are you still use single gpu training? I retrained with 4-gpu and get lower results than the readme.
https://paste.ubuntu.com/p/CtJH9Hk52F/

Xianpeng919 · 2022-03-08T01:18:06Z

@ganyz You can restart the training from scratch.

Xianpeng919 · 2022-03-08T01:20:37Z

@rockywind I double checked your log, the config looks good to me. I'll double check the code. You can also try another random seed to train again to see the performance.

rockywind · 2022-03-08T09:04:43Z

@Xianpeng919
OK， thanks a lot!

Cc-Hy · 2022-03-10T08:39:30Z

@rockywind @ganyz @kaixinbear @Xianpeng919
I find that during the training, there will be several epochs whose performance is extremely low(close to 0), and the performance from the last epoch may differ by more than 10 points.
Did you meet this situation?

Cc-Hy · 2022-03-10T08:40:56Z

Epoch 112

Epoch 115

Cc-Hy · 2022-03-10T12:00:23Z

Tried another time, and the best performance is as follow:

djp1235a · 2022-03-10T12:32:10Z

I conduct 3 experiments with different seeds, and the best performance is 17.80. Besides, results are not reproducible with the same seed and deterministic==True in the codebase.

excitohe · 2022-03-11T01:39:07Z

I retrained twice and got 16.20 on the GTX1080Ti and 16.80 on the Titan V. It seems that no one in the issue can retrain more than 18.00, makes me frustrated.... =_=!

excitohe · 2022-03-11T02:48:43Z

@rockywind @ganyz @kaixinbear @Xianpeng919 I find that during the training, there will be several epochs whose performance is extremely low(close to 0), and the performance from the last epoch may differ by more than 10 points. Did you meet this situation?

It's normal. Mono3D performance is always unstable. Just pay attention to the last few checkpoint eval results. 0.0

djp1235a · 2022-03-11T03:34:41Z

@excitohe I konw that the Mono3D performance is always unstable. But results are reproducible with the same seed and deterministic==True in the Monodle codebase. I'm just wondering why nondeterministic algorithms appear when using mmdet reimplementation.

excitohe · 2022-03-11T04:46:55Z

@djp1235a
Unified reply from OpenMMLab

Xianpeng919 · 2022-03-11T06:03:35Z

@excitohe @djp1235a @Cc-Hy I'm re-training the model based on the released code using different GPUs. I'll share with you the log in this thread once the result is out.

Xianpeng919 · 2022-03-11T13:46:48Z

@Cc-Hy You can refer to mmdet3d's visualization scripts. Their scripts are very helpful.

Cc-Hy · 2022-03-12T14:52:28Z

@Xianpeng919
Hello, I tried to add "--show" arg in test.py, and I tried to directly use the mono_det_demo.py.
But both of them can not work properly.
Can you tell me which script do you use? And do I need to do some modifications?

Xianpeng919 · 2022-03-13T04:33:30Z

@Cc-Hy You can do inference you model first and revise the show_results function in the mmdet3d.core.visualizer

excitohe · 2022-03-14T11:20:50Z

@Xianpeng919 Have you finished your retraining results yet? Looking forward to your train log file. ^_^

Xianpeng919 · 2022-03-16T13:10:28Z

@excitohe Hi, I was travelling last weekend. Please check this log for more details. I also attach the ckpt here in case you need it. Please run *_car.py config for inference.

Cc-Hy · 2022-03-17T01:14:22Z

Tried again:

excitohe · 2022-03-17T01:59:44Z

Hi, I migrate monocon into latest mmdet3d in plugin_dir manner, and try again with only_car with your latest updated config in 4GPU.

Car [email protected], 0.70, 0.70:
bbox AP40:96.3800, 90.3432, 80.7128
bev  AP40:29.0449, 22.2251, 19.4256
3d   AP40:21.4625, 16.1725, 14.3990
aos  AP40:95.73, 89.51, 79.49

Attach the training log:
https://paste.ubuntu.com/p/HyryFkZspc/

Can you see where is the problem？Thank you so much and keep in touch. ^_^

I will reconfigure your original environment and test again with single GPU...

excitohe · 2022-03-17T02:01:48Z

@Cc-Hy Hi, Is this your recent result in only_car config? It looks like we're about the same...

Cc-Hy · 2022-03-17T07:05:32Z

@Cc-Hy Hi, Is this your recent result in only_car config? It looks like we're about the same...

No, these are 3 class results.
I'm training with Car only now.

Cc-Hy · 2022-03-18T03:16:02Z

Car only

kaixinbear · 2022-03-29T02:00:35Z

@Cc-Hy @Xianpeng919 @ganyz

Could you please tell me how to solve this model collapse problem? By turn down lr or change random seed?
I have tried many times but the AP drops to 0 at about 120 epoch.

Cc-Hy · 2022-04-09T12:29:05Z

If you always meet this problem, you can modify the dimension loss with L1 loss only, L = |gt - pred|.
And then the dimension loss will never explode.
@kaixinbear

gervaisi · 2022-06-16T08:49:28Z

@Xianpeng919
I want to use the model with mono_det_demo.py but it asks me an annotation file, where can i find it ? I precise that i've already trained the model

FlyingAnt2018 · 2023-09-25T07:35:13Z

@Cc-Hy @Xianpeng919 Hi, I trained the model on training set and tested on the validation. The moderate class 3D AP is 17.57, but the paper say it was 19.03.

Hi, i got AP 19.0217 by setting "cfg.SEED = 1903919922 "

About #3

About #3

Comments

Cc-Hy commented Mar 3, 2022

rockywind commented Mar 3, 2022 • edited Loading

Xianpeng919 commented Mar 3, 2022

rockywind commented Mar 4, 2022

rockywind commented Mar 4, 2022 • edited Loading

Xianpeng919 commented Mar 4, 2022

Cc-Hy commented Mar 5, 2022

rockywind commented Mar 5, 2022

kaixinbear commented Mar 5, 2022

Xianpeng919 commented Mar 5, 2022

Xianpeng919 commented Mar 5, 2022

Xianpeng919 commented Mar 5, 2022

kaixinbear commented Mar 5, 2022

ganyz commented Mar 7, 2022 • edited Loading

rockywind commented Mar 7, 2022

excitohe commented Mar 7, 2022

Xianpeng919 commented Mar 8, 2022

Xianpeng919 commented Mar 8, 2022

rockywind commented Mar 8, 2022

Cc-Hy commented Mar 10, 2022

Cc-Hy commented Mar 10, 2022

Cc-Hy commented Mar 10, 2022

djp1235a commented Mar 10, 2022

excitohe commented Mar 11, 2022

excitohe commented Mar 11, 2022

djp1235a commented Mar 11, 2022

excitohe commented Mar 11, 2022

Xianpeng919 commented Mar 11, 2022

Xianpeng919 commented Mar 11, 2022

Cc-Hy commented Mar 12, 2022

Xianpeng919 commented Mar 13, 2022

excitohe commented Mar 14, 2022

Xianpeng919 commented Mar 16, 2022 • edited Loading

Cc-Hy commented Mar 17, 2022

excitohe commented Mar 17, 2022

excitohe commented Mar 17, 2022

Cc-Hy commented Mar 17, 2022

Cc-Hy commented Mar 18, 2022

kaixinbear commented Mar 29, 2022

Cc-Hy commented Apr 9, 2022 • edited Loading

gervaisi commented Jun 16, 2022

FlyingAnt2018 commented Sep 25, 2023

rockywind commented Mar 3, 2022 •

edited

Loading

rockywind commented Mar 4, 2022 •

edited

Loading

ganyz commented Mar 7, 2022 •

edited

Loading

Xianpeng919 commented Mar 16, 2022 •

edited

Loading

Cc-Hy commented Apr 9, 2022 •

edited

Loading