Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worse accuracy while continuing training due to a possible mistake in initializing setup #9

Open
Kumoi0728 opened this issue Jan 28, 2022 · 3 comments

Comments

@Kumoi0728
Copy link

Kumoi0728 commented Jan 28, 2022

I trained a MVTN model with 100 epochs with the following command, and stopped training after 57 epochs.

python run_mvtn.py --data_dir data/ModelNet40/ --run_mode train --mvnetwork mvcnn --epochs 100 --nb_views 1 --views_config learned_circular

And the output of the 57th epoch is like this,

Epoch: [57/100]
Iter [50/492] Loss: 0.7633
Iter [100/492] Loss: 0.7892
Iter [150/492] Loss: 0.3939
Iter [200/492] Loss: 0.1820
Iter [250/492] Loss: 0.2282
Iter [300/492] Loss: 0.6939
Iter [350/492] Loss: 0.4468
Iter [400/492] Loss: 0.2383
Iter [450/492] Loss: 0.5454
Evaluation:
train acc: 82.03 - train Loss: 0.6457
Val Acc: 71.31 - val Loss: 1.0960
Current best val acc: 72.61

When I load the trained model to continue training, although it started training from the 58th epoch correctly, the accuracies got lower,

Epoch: [58/100]
Iter [50/492] Loss: 1.2060
Iter [100/492] Loss: 0.6699
Iter [150/492] Loss: 0.5014
Iter [200/492] Loss: 0.4189
Iter [250/492] Loss: 0.2721
Iter [300/492] Loss: 0.3099
Iter [350/492] Loss: 1.0518
Iter [400/492] Loss: 1.1512
Iter [450/492] Loss: 0.2506
Evaluation:
train acc: 55.48 - train Loss: 1.6519
Val Acc: 60.13 - val Loss: 1.4470
Current best val acc: 72.61

I found that in ops.py line 260-264, only when is_learning_views = True , the trained MVTN model will be loaded,

if setup["is_learning_views"]:
        models_bag["mvtn"].load_state_dict(
            checkpoint['mvtn'])
        models_bag["mvtn_optimizer"].load_state_dict(
            checkpoint['mvtn_optimizer'])

and in line 55-56, is_learning_views in setup is initialized like this,

setup["is_learning_views"] = setup["views_config"] in ["learned_offset",
                                                       "learned_direct", "learned_spherical", "learned_random", "learned_transfer"]

should the learned_offset in line 55 be repalced by learned_circular?
Becaues the choices of learned views_config must be learned_circular, learned_spherical, learned_direct, learned_random or learned_transfer.

I am sorry if the reason is not here. I would appreciate it if you could tell me the correct way. :) @ajhamdi

@Kumoi0728
Copy link
Author

I checked the results of the continued training. With epoch=57 as the cut-off point, the position of camera also changed a lot.
When epoch=57, camera 0 was like:
MV_cameras_57
However, when epoch=60, camera 0 was like:
MV_cameras_60

According to other epochs, the position of the camera should not have changed so much. I think this is due to the fact that the trained MVTN model was not loaded correctly when the training continued.

@ajhamdi
Copy link
Owner

ajhamdi commented Feb 10, 2022

yes @Kumoi0728 you are right . I think this is a bug in the cod. I will look into it

@auniquesun
Copy link

auniquesun commented Nov 16, 2022

I trained a MVTN model with 100 epochs with the following command, and stopped training after 57 epochs.

python run_mvtn.py --data_dir data/ModelNet40/ --run_mode train --mvnetwork mvcnn --epochs 100 --nb_views 1 --views_config learned_circular

And the output of the 57th epoch is like this,

Epoch: [57/100]
Iter [50/492] Loss: 0.7633
Iter [100/492] Loss: 0.7892
Iter [150/492] Loss: 0.3939
Iter [200/492] Loss: 0.1820
Iter [250/492] Loss: 0.2282
Iter [300/492] Loss: 0.6939
Iter [350/492] Loss: 0.4468
Iter [400/492] Loss: 0.2383
Iter [450/492] Loss: 0.5454
Evaluation:
train acc: 82.03 - train Loss: 0.6457
Val Acc: 71.31 - val Loss: 1.0960
Current best val acc: 72.61

When I load the trained model to continue training, although it started training from the 58th epoch correctly, the accuracies got lower,

Epoch: [58/100]
Iter [50/492] Loss: 1.2060
Iter [100/492] Loss: 0.6699
Iter [150/492] Loss: 0.5014
Iter [200/492] Loss: 0.4189
Iter [250/492] Loss: 0.2721
Iter [300/492] Loss: 0.3099
Iter [350/492] Loss: 1.0518
Iter [400/492] Loss: 1.1512
Iter [450/492] Loss: 0.2506
Evaluation:
train acc: 55.48 - train Loss: 1.6519
Val Acc: 60.13 - val Loss: 1.4470
Current best val acc: 72.61

I found that in ops.py line 260-264, only when is_learning_views = True , the trained MVTN model will be loaded,

if setup["is_learning_views"]:
        models_bag["mvtn"].load_state_dict(
            checkpoint['mvtn'])
        models_bag["mvtn_optimizer"].load_state_dict(
            checkpoint['mvtn_optimizer'])

and in line 55-56, is_learning_views in setup is initialized like this,

setup["is_learning_views"] = setup["views_config"] in ["learned_offset",
                                                       "learned_direct", "learned_spherical", "learned_random", "learned_transfer"]

should the learned_offset in line 55 be repalced by learned_circular? Becaues the choices of learned views_config must be learned_circular, learned_spherical, learned_direct, learned_random or learned_transfer.

I am sorry if the reason is not here. I would appreciate it if you could tell me the correct way. :) @ajhamdi

Recently, I have experimented with the code in this repo.
I agree with you that in line 55 in ops.py, learn_offset should be replaced by learned_circular.

I found even though training from scratch, the resutls are not unsatisfactory, shown in the following figure.
1668602288742

In my case, I set views_config=learned_spherical and test on ScanObjectNN. According to the code, the model will adjust the scene parameters for choosing a better position to render the point clouds into images, then classify the images using MVCNN.
However, after 21 epochs, I only get 18.7% accuracy. I think the score is too low and the process is abnormal.

I have read the code and understood their working process.
There is no bug during training and evaluation, but I am not sure whether I used the proper settings.
The running command is shown in the following figure.
1668602698259

Do you have any insight or advice on the poor performance? Thanks.
@ajhamdi @Kumoi0728

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants