-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dag and task metrics should be initialized to zero at startup #68
Comments
A workaround here:
reference: prometheus/prometheus#1673 |
A caveat with the workaround is that the exporter provides a total count of past failures, so when you first start the exporter (or if there's a sufficiently long interruption in metrics), when the exporter comes up everything that failed in the past will show new failures. So, zero initialization would be superior. |
Agreed, this is no good long-term solution. This issue is still there in Airflow 2.8.x ff. as well. Does anyone have a hint where the metrics and their values are produced here in statsd? Setting the counter to zero makes total sense. I am having a similar issue with the first failure of any dag @WakeupTsai does have a good solution, but it is a workaround in the face of prometheus' architecture. |
Airflow metrics don't get reset after a restart, however, the metrics did not get initialized. This lead to some unexpected PromQL responses when querying with missing data.
For example, a task state 'failed' is set to '1' at the first failure of the task but before the failure no data existed for the task with state 'failed'. A PromQL query that checks if the task at least executed once over a time period using the 'increase' function, based on either 'success' or 'failed' state count increase over that time period, responded as if neither state changed over the period of time because the 'increase' function extrapolates the value that is available over the time period if there is no data.
Prometheus documentation discusses about this issue:
A potential fix for this issue is to initialize all dag and their task metrics to zero at startup.
The text was updated successfully, but these errors were encountered: