coordinator: expose heartbeat control to task handlers #142

schmichael · 2015-08-26T00:13:32Z

Right now the m_etcd Coordinator handles maintaining the task claim in etcd by heartbeating the claim node before the TTL expires, and calling Handler#Stop() if it's unable to maintain the claim.

Users only have to ensure their Stop() methods cause Handler#Run() to exit in a timely manner to ensure correctness of the execute-task-exactly-once guarantee.

However, this does nothing to guarantee the correctness of the User's Handler#Run() method. The method may have deadlocked, but the claim persists. This means the work is effectively running 0 times, not exactly once, in the cluster, and there's no indicator to an operator that anything is amiss.

Solution: Allow Handlers to handle heartbeating

Coordinators should be able to provide a Task#Heartbeat() error method for handlers to call manually to keep the claim alive. Handlers would be expected to exit if an error was returned.

An adapter or similar helper would be provided to keep the original behavior of having the heartbeat fully controlled.

The text was updated successfully, but these errors were encountered:

schmichael added enhancement etcd RFC labels Aug 26, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coordinator: expose heartbeat control to task handlers #142

coordinator: expose heartbeat control to task handlers #142

schmichael commented Aug 26, 2015

coordinator: expose heartbeat control to task handlers #142

coordinator: expose heartbeat control to task handlers #142

Comments

schmichael commented Aug 26, 2015

Solution: Allow Handlers to handle heartbeating