You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now the m_etcd Coordinator handles maintaining the task claim in etcd by heartbeating the claim node before the TTL expires, and calling Handler#Stop() if it's unable to maintain the claim.
Users only have to ensure their Stop() methods cause Handler#Run() to exit in a timely manner to ensure correctness of the execute-task-exactly-once guarantee.
However, this does nothing to guarantee the correctness of the User's Handler#Run() method. The method may have deadlocked, but the claim persists. This means the work is effectively running 0 times, not exactly once, in the cluster, and there's no indicator to an operator that anything is amiss.
Solution: Allow Handlers to handle heartbeating
Coordinators should be able to provide a Task#Heartbeat() error method for handlers to call manually to keep the claim alive. Handlers would be expected to exit if an error was returned.
An adapter or similar helper would be provided to keep the original behavior of having the heartbeat fully controlled.
The text was updated successfully, but these errors were encountered:
Right now the
m_etcd
Coordinator handles maintaining the task claim in etcd by heartbeating the claim node before the TTL expires, and callingHandler#Stop()
if it's unable to maintain the claim.Users only have to ensure their
Stop()
methods causeHandler#Run()
to exit in a timely manner to ensure correctness of the execute-task-exactly-once guarantee.However, this does nothing to guarantee the correctness of the User's
Handler#Run()
method. The method may have deadlocked, but the claim persists. This means the work is effectively running 0 times, not exactly once, in the cluster, and there's no indicator to an operator that anything is amiss.Solution: Allow Handlers to handle heartbeating
Coordinators should be able to provide a
Task#Heartbeat() error
method for handlers to call manually to keep the claim alive. Handlers would be expected to exit if an error was returned.An adapter or similar helper would be provided to keep the original behavior of having the heartbeat fully controlled.
The text was updated successfully, but these errors were encountered: