Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling handleNodeFenceError #28

Open
bronhaim opened this issue Jan 8, 2018 · 3 comments
Open

Handling handleNodeFenceError #28

bronhaim opened this issue Jan 8, 2018 · 3 comments
Labels

Comments

@bronhaim
Copy link
Collaborator

bronhaim commented Jan 8, 2018

after reaching giveup retries we need to retry triggering all step jobs. giveup retries are raised each jobs polling if still not done successfully

@bronhaim bronhaim added the new label Jan 8, 2018
@rgolangh
Copy link
Contributor

rgolangh commented Jan 9, 2018

btw, how do you keep jobs from colliding? i.e a job to fence a host, and a job to un-fence it? (or maybe cordon or uncordon is a better term)

@bronhaim
Copy link
Collaborator Author

bronhaim commented Jan 9, 2018

the fence and un-fence scripts are different and not related to each-other. each script can do the opposite action - I can't prevent it. The admin sets the methods to run in each step, such declaration that can cause collision is less likely

@bronhaim
Copy link
Collaborator Author

bronhaim commented Jan 9, 2018

once job failed we need to set anti-affinity to avoid running job on same node.

  • check if jobs already contain such option (on fail to try on different node)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants