Starting from a previous model #42
-
Is pretraining expected to work for allegro? I have some data which I train an allegro model to. I then add some more data. Adapting what worked for straight NequIP, I try setting:
in the yaml, along with e.g. When I try to start this, the first epoch gives me Am I doing something obviously wrong? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 9 replies
-
Hi @terryfrankcombe , Are the initial validation epochs consistent with the last epochs of the previous training you are starting from? This looks technically correct from your post. The Can you elaborate on "what worked for NequIP" and what you mean by "worked" there? From a machine learning perspective, it is not always possible to improve a model with finetuning like this, especially under more significant data shift. To the best of my knowledge this is largely unexplored in the MLIP literature, and most efforts I am aware of retrain from scratch when they add new data. For example, I believe @svandenhaute's Psiflow (https://www.nature.com/articles/s41524-023-00969-x., https://svandenhaute.github.io/psiflow/) retrains from scratch. Thanks! |
Beta Was this translation helpful? Give feedback.
-
In psiflow, we do in fact support both training datasets from scratch as well starting from a pretrained model. I personally use this extensively with NequIP and I did notice that total training time is signifcantly reduced when starting from a pretrained model. This should also work with Allegro, though I have less experience with it. You can take look at the modified train script to see how we do it; besides initializing with the pretrained weights, you also need to reset the validation metrics in the EDIT: sorry, didn't notice this was a discussion; this should be part of @Linux-cpp-lisp's reply thread. |
Beta Was this translation helpful? Give feedback.
Hi Terry,
Interesting, glad and interested to hear that worked well for you in NequIP. You are doing the same system and data shift with Allegro?
It is certainly possible that there is a bug here; one thing I can think of: did you remember to clear the rundirs for your finetuning? If you accidentally had
append: True
and still had a rundir,initialize_from_state
would be a no-op since it would be a restart instead of a new model...