Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing loss coefficient #103

Open
sogenyi opened this issue Sep 25, 2024 · 1 comment
Open

Changing loss coefficient #103

sogenyi opened this issue Sep 25, 2024 · 1 comment

Comments

@sogenyi
Copy link

sogenyi commented Sep 25, 2024

Dear All,

I am training a model on a SiO2 zeolite system with following setting with loss coefficients defined as
loss_coeffs:
forces: 1.
total_energy:
- 1.
- PerAtomMSELoss
After a reasonable step i am getting the following error:
f_mse f_rmse e_mse e_rmse
0.00942, 0.013, 0.0291, 0.0414 (training)
0.0122, 0.0191, 0.00997, 0.01 (validation)

Questions:

  1. I want to improve on the error from the total_energy component. Is it possible to modify it say to 3.0-10.0 instead of 1.0 after the current time steps or epoch? If yes, how do i do this so as to avoid
    conflicting parameters in config file and trainer.path file?
  2. Are the default setting for the metrics in peratom (unit) or do i need to explicitly define the metrics as in the case full.yaml example in nequip?
@cw-tan
Copy link
Collaborator

cw-tan commented Nov 22, 2024

Hi @sogenyi

Apologies for the delayed response, but the nequip framework and allegro model (that runs in the nequip infrastructure) are undergoing a major overhaul. We are close to the end of the revamps, and things look very different from what you see on main. I would advise migrating to the new infrastructure if you're new to the code and don't mind retraining the models you have. Otherwise, the answers to your questions:

  1. This can be achieved by using the following callback https://github.com/mir-group/nequip/blob/1e150cdc8614e640116d11e085d8e5e45b21e94d/configs/full.yaml#L255 for the current public code. If you want to use the new code (you would need to use the develop branches of both nequip and allegro), the equivalent callback is this one https://nequip.readthedocs.io/en/develop/api/callbacks.html#nequip.train.callbacks.LossCoefficientScheduler

  2. The entire config system is that of nequip's (allegro is just a model, the surrounding training infrastructure including config file parsing is nequip's job), hence what you see in the full.yaml example in nequip will be applicable to allegro, except for details concerning model hyperparameters.

Chuin Wei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants