Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coordinator: expose heartbeat control to task handlers #142

Open
schmichael opened this issue Aug 26, 2015 · 0 comments
Open

coordinator: expose heartbeat control to task handlers #142

schmichael opened this issue Aug 26, 2015 · 0 comments

Comments

@schmichael
Copy link
Contributor

Right now the m_etcd Coordinator handles maintaining the task claim in etcd by heartbeating the claim node before the TTL expires, and calling Handler#Stop() if it's unable to maintain the claim.

Users only have to ensure their Stop() methods cause Handler#Run() to exit in a timely manner to ensure correctness of the execute-task-exactly-once guarantee.

However, this does nothing to guarantee the correctness of the User's Handler#Run() method. The method may have deadlocked, but the claim persists. This means the work is effectively running 0 times, not exactly once, in the cluster, and there's no indicator to an operator that anything is amiss.

Solution: Allow Handlers to handle heartbeating

Coordinators should be able to provide a Task#Heartbeat() error method for handlers to call manually to keep the claim alive. Handlers would be expected to exit if an error was returned.

An adapter or similar helper would be provided to keep the original behavior of having the heartbeat fully controlled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant