-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] Notification backoff - avoid flooding notifications during prolonged outage #317
Comments
Good feature design 👍 I'd be happy to upstream this feature if someone has worked or plans to work on it! |
Looking at the code responsible for this, it seems to me the keep-alive threshold should still define the check interval but then the modification would be to:
Doesn't seem too difficult to implement. If the above plan sounds ok, I should be able to eventually knock something out. My immediate issue is gone though via automatic service restarts based on monitoring TCP 8444. :-) |
Yes, sounds good, I like the idea to still log out all the events with informations for the reason of missing notification. I think it's OK to handle this entirely in the That is unless we want to have a more general notification throttling that works across all event types. But as I understand it, this is currently the most offensive notification type. |
Alright, so I'll try to implement a really targeted fix for the keep-alive but keep in mind that it could be refactored later into a more generic solution. |
At first notifications will arrive roughly at the same speed as before but the delay starts increasing gradually over time. Keep-alive is still tested with the same frequency but failures in between notifications are only logged with the next notification threshold included. Fixes martomi#317
At first notifications will arrive roughly at the same speed as before but the delay starts increasing gradually over time. Keep-alive is still tested with the same frequency but failures in between notifications are only logged with the next notification threshold included. Fixes martomi#317
TL;DR; a prolonged harvester outage of say 8h during sleep will produce a notification every 5 minutes per harvester. This gets annoying to wake up to, and also may consume limited resources / credits of the notification service. Instead an exponential backoff should be used which can easily drop consumption by 90% while still providing much the same level of urgency to an operator.
Steps to produce suboptimal behavior:
Better behavior:
Solution summary:
T() = interval * exponential_rate ^ retryNumber
The text was updated successfully, but these errors were encountered: