-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENG-654: Neptune reports "step must be strictly increasing" error if lightning logs in training and validation step #1702
Comments
Hey @NiklasKappel 👋 Usage-wise, this should not have any impact, as the error is basically due to Neptune trying to log the epoch number for the same step in both the training and validation loop. Since this has already been logged once, ignoring the second logging call does not impact the charts. However, I agree that the error logs you see are a nuisance, especially on longer training jobs where they can quickly flood the console output. |
Any fix? |
Hey @rkinas, This issue is currently in the engineering backlog. However, as mentioned earlier, this error typically does not have any negative impact on logging, so it is not considered a priority at the moment. Please let me know if this issue is more serious for you. |
yes, I see ... but it's a little annoying how such messages appear in the login console :) |
If this is crowding your console output, you can silence Neptune messages using Python import logging
logging.getLogger("neptune").setLevel(logging.CRITICAL) Note that this will silence ALL Neptune messages, including Run initialization and sync status messages. |
Hey @rkinas , While there is no permanent fix to this problem yet, we have a workaround that can let you filter out such error messages from the console without having to silence all logging. You just need to add the below snippet to your scripts: import logging
class _FilterCallback(logging.Filterer):
def filter(self, record: logging.LogRecord):
return not (
record.name == "neptune"
and record.getMessage().startswith(
"Error occurred during asynchronous operation processing: X-coordinates (step) must be strictly increasing for series attribute"
)
)
logging.getLogger("neptune").addFilter(_FilterCallback()) Please let me know if this would work for you 🙏 |
Describe the bug
Using the Neptune logger in lightning, I get multiple of the following errors:
The number at the end is different for every error. The errors only appear when there is a
log
call in BOTH of lightningstrainig_step
andvalidation_step
methods.Reproduction
Run the following MWE after setting
project
andapi_key
inmain
. Comment out one of the twoself.log
calls to see the errors go away.Expected behavior
No errors when logging both train and val metrics.
Environment
The output of
pip list
:The operating system you're using: Linux (Fedora 39)
The output of
python --version
: Python 3.11.8The text was updated successfully, but these errors were encountered: