Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix file name of swa checkpoints #648

Merged
merged 2 commits into from
Oct 24, 2024
Merged

Conversation

beckobert
Copy link
Contributor

This should fix a the file name given to the checkpoint after the very first swa epoch.

Currently, swa is started in train.py when epoch == swa_start, but the checkpoint file name only adds "swa" to the file name when epoch > swa_start.

This leads to a problem, when the first swa epoch has the lowest loss of all validated swa epochs. The checkpoint is saved under the wrong name, never overwritten and, at the end of the training, this checkpoint is saved as the best non-swa checkpoint (if the loss is lower then for any stage-one checkpoint), whereas the swa checkpoint is not found and an error is thrown.

How did I manage a typo when adding a single character?
@ilyes319
Copy link
Contributor

Good spotting thank you!

@ilyes319 ilyes319 merged commit defcb19 into ACEsuit:develop Oct 24, 2024
2 checks passed
@beckobert beckobert deleted the swa_fix branch October 25, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants