Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an exponential backoff to harvester keep-alive notifications #326

Closed
wants to merge 15 commits into from

Conversation

jinnatar
Copy link
Contributor

@jinnatar jinnatar commented Jan 14, 2022

At first notifications will arrive roughly at the same speed as before
but the delay starts increasing gradually over time. Keep-alive is still
tested with the same frequency but failures in between notifications are
only logged with the next notification threshold included.

Fixes #317

  • Also includes merging Docker changes from main -> dev
  • A Docker image has been built for testing: ghcr.io/artanicus/chiadog:keepalive_antispam-f73494
  • Example log output with extra debug and a shorter 60s threshold for the benefit of better seeing what's going on:
[2022-01-14 23:14:09] [    INFO] --- Connected HDD? The total plot count increased from 0 to 153. (non_decreasing_plots.py:33)
[2022-01-14 23:15:50] [ WARNING] --- Your harvester appears to be offline! No events for the past 101 seconds. (Iteration 0) (keep_alive_monitor.py:111)
[2022-01-14 23:16:51] [ WARNING] --- Your harvester appears to be offline! No events for the past 161 seconds. (Iteration 1) To avoid flooding you, the next notification won't be sent before 2022-01-14 23:17:20.696072 (keep_alive_monitor.py:111)
[2022-01-14 23:17:51] [ WARNING] --- Your harvester appears to be offline! No events for the past 221 seconds. (Iteration 1) (keep_alive_monitor.py:111)
[2022-01-14 23:18:52] [ WARNING] --- Your harvester appears to be offline! No events for the past 282 seconds. (Iteration 2) (keep_alive_monitor.py:111)
[2022-01-14 23:19:52] [ WARNING] --- Your harvester appears to be offline! No events for the past 342 seconds. (Iteration 3) (keep_alive_monitor.py:111)
[2022-01-14 23:20:52] [ WARNING] --- Your harvester appears to be offline! No events for the past 403 seconds. (Iteration 4) To avoid flooding you, the next notification won't be sent before 2022-01-14 23:20:54.446072 (keep_alive_monitor.py:111)
[2022-01-14 23:21:52] [ WARNING] --- Your harvester appears to be offline! No events for the past 463 seconds. (Iteration 4) (keep_alive_monitor.py:111)
[2022-01-14 23:22:53] [ WARNING] --- Your harvester appears to be offline! No events for the past 524 seconds. (Iteration 5) To avoid flooding you, the next notification won't be sent before 2022-01-14 23:23:26.321072 (keep_alive_monitor.py:111)
[2022-01-14 23:23:53] [ WARNING] --- Your harvester appears to be offline! No events for the past 584 seconds. (Iteration 5) (keep_alive_monitor.py:111)
[2022-01-14 23:24:53] [ WARNING] --- Your harvester appears to be offline! No events for the past 644 seconds. (Iteration 6) To avoid flooding you, the next notification won't be sent before 2022-01-14 23:27:14.133572 (keep_alive_monitor.py:111)
[2022-01-14 23:25:53] [ WARNING] --- Your harvester appears to be offline! No events for the past 704 seconds. (Iteration 6) To avoid flooding you, the next notification won't be sent before 2022-01-14 23:27:14.133572 (keep_alive_monitor.py:111)
[2022-01-14 23:26:54] [ WARNING] --- Your harvester appears to be offline! No events for the past 764 seconds. (Iteration 6) To avoid flooding you, the next notification won't be sent before 2022-01-14 23:27:14.133572 (keep_alive_monitor.py:111)
[2022-01-14 23:27:54] [ WARNING] --- Your harvester appears to be offline! No events for the past 824 seconds. (Iteration 6) (keep_alive_monitor.py:111)
[2022-01-14 23:28:54] [ WARNING] --- Your harvester appears to be offline! No events for the past 884 seconds. (Iteration 7) To avoid flooding you, the next notification won't be sent before 2022-01-14 23:32:55.852322 (keep_alive_monitor.py:111)
[2022-01-14 23:29:54] [ WARNING] --- Your harvester appears to be offline! No events for the past 945 seconds. (Iteration 7) To avoid flooding you, the next notification won't be sent before 2022-01-14 23:32:55.852322 (keep_alive_monitor.py:111)
[2022-01-14 23:30:54] [ WARNING] --- Your harvester appears to be offline! No events for the past 1005 seconds. (Iteration 7) To avoid flooding you, the next notification won't be sent before 2022-01-14 23:32:55.852322 (keep_alive_monitor.py:111)
[2022-01-14 23:31:54] [ WARNING] --- Your harvester appears to be offline! No events for the past 1065 seconds. (Iteration 7) To avoid flooding you, the next notification won't be sent before 2022-01-14 23:32:55.852322 (keep_alive_monitor.py:111)
[2022-01-14 23:32:54] [ WARNING] --- Your harvester appears to be offline! No events for the past 1125 seconds. (Iteration 7) To avoid flooding you, the next notification won't be sent before 2022-01-14 23:32:55.852322 (keep_alive_monitor.py:111)
[2022-01-14 23:32:58] [ WARNING] --- Experiencing networking issues? Harvester did not participate in any challenge for 1128 seconds. It's now working again. (time_since_last_farm_event.py:40)
[2022-01-14 23:33:55] [    INFO] --- incident for EventService.HARVESTER is over (keep_alive_monitor.py:115)
[2022-01-14 23:35:55] [ WARNING] --- Your harvester appears to be offline! No events for the past 95 seconds. (Iteration 0) (keep_alive_monitor.py:111)
[2022-01-14 23:36:56] [ WARNING] --- Your harvester appears to be offline! No events for the past 156 seconds. (Iteration 1) To avoid flooding you, the next notification won't be sent before 2022-01-14 23:37:25.442751 (keep_alive_monitor.py:111)
[2022-01-14 23:37:56] [ WARNING] --- Your harvester appears to be offline! No events for the past 216 seconds. (Iteration 1) (keep_alive_monitor.py:111)
[2022-01-14 23:38:56] [ WARNING] --- Your harvester appears to be offline! No events for the past 276 seconds. (Iteration 2) (keep_alive_monitor.py:111)
[2022-01-14 23:39:02] [ WARNING] --- Experiencing networking issues? Harvester did not participate in any challenge for 282 seconds. It's now working again. (time_since_last_farm_event.py:40)
[2022-01-14 23:39:56] [    INFO] --- incident for EventService.HARVESTER is over (keep_alive_monitor.py:115)

kanasite and others added 9 commits August 15, 2021 11:14
…i#307)

This commits adds new notification for plot increases (e.g connecting HDD) and makes the notifications for increases/decreases configurable via the config for every integration.
* Add a scripts/linux/chiadog.service systemd example that attempts to run chiadog in a more isolated environment. Create a new limited user each run (making much of the filesystem readonly), and set `.chia/mainnet` folders other than `log` to inaccessible.
* Move the offset file - previously, the `debug.log.offset` file was kept in the chiadog directory, but when running in a read only filesystem we can't write there. Create a temporary directory when running, and store the offset file there. This also means we no longer need to delete the offset file on startup.
…mi#314)

Sometimes it is useful to disable SMTP authentication for sending emails. For example, if someone is using a local postfix server, the default install doesn't require/enable auth.

When this new config option is set to enable_smtp_auth: false, the username_smtp and password_smtp config options are ignored.
@jinnatar jinnatar marked this pull request as ready for review January 14, 2022 21:54
Copy link
Owner

@martomi martomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up with this! I have some feedback that needs to be addressed before I can upstream this. Let me know what you think.

src/notifier/__init__.py Outdated Show resolved Hide resolved
src/notifier/__init__.py Outdated Show resolved Hide resolved
if self._keep_alive_iteration[service] > 0:
logging.info(f"incident for {service} is over")
self._keep_alive_iteration[service] = 0
self._keep_alive_incident_time[service] = None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm looking at this, it's actually making this class more complex than I'd like (and really, a pain to maintain). It would be justified to refactor this into its own class. This will allow us to cover the functionality by unit tests too. I'd be hesitant to merge before we have some simple unit-test coverage.

As I see it, It should keep state for a single service, and then we can create instances for every service here. I'd propose the following interface (name can be better):

class EventThrottleHelper:
    def __init__(self, interval_seconds: int):
        self.event_counter = 0
        self.first_event_time = None
        self.interval_seconds = interval_seconds

    def should_trigger_event(self):
        time_now = datetime.now()

        # Auto-detect when to reset event_counter and
        # first_event_time based on current time,
        # interval_seconds and first_event_time

where you initialize the class with the self._last_keep_alive_threshold_seconds[service] and everything else can be encapsulated.

The change in KeepAliveMonitor would then be very minimal, just extend the if-statement on line 80:

if seconds_since_last > self._last_keep_alive_threshold_seconds[service] and \
    throttle_helper[service].should_trigger_event():

.github/workflows/publish-image.yml Outdated Show resolved Hide resolved
At first notifications will arrive roughly at the same speed as before
but the delay starts increasing gradually over time. Keep-alive is still
tested with the same frequency but failures in between notifications are
only logged with the next notification threshold included.

Fixes martomi#317
This wasn't caught earlier since the data from the Event isn't used yet.
This is purely for future proofing so that the iteration count is
available, even if not included in the message.
This had no practical application yet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants