-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle tombstones that were written before kubexit started. #11
base: master
Are you sure you want to change the base?
Conversation
@RemcodM can you please verify that this fixes your problem? |
@karlkfi Sadly no, as this change introduces a new issue on side A (in the context of #8). When there is a timeout waiting for birth dependencies, no tombstone is written, because Logs:
|
I am not entirely sure what would be a good fix. You could just always write a tombstone, even if However, writing a tombstone without calling Even though I am not completely sure about a good fix, I would propose to always write a tombstone anyway. If |
I checked the code paths once more, and it seems there is only one case (at the moment at least) in which If recording the birth fails, there might be a race condition with the child that also has immediatly finished after startup. In that case |
Doesn't that mean the original race condition is solved, because it can't happen any more? If the process never started, there shouldn't need to be a tombstone. The process never started nor died. Is there actually a case that might need one other than a transitive dependency?
The writes probably need retry loops, if Go doesn't do that already within the std lib. |
Wouldn't that defeat the purpose of kubexit in the first place? If service A has a birth dependency B. B fails to start within the timeout. A would in that case stop and never write a tombstone (because the proces never started). B then starts outside of the timeout, cannot detect that A has stopped and will continue to run. Kubexit was created with Kubernetes Jobs with sidecar containers in mind. If B is the 'sidecar' container here, B will never stop if it has not started in time. The Kubernetes Job will run forever, which in my humble opinion defeats the purpose of this project in the first place. |
you're right. thanks for spelling it out. we need to write a tombstone even if the process didn't start so that others can still have a death dependency on it. i think we can also simulate this in a test with a sleep before running kubexit. |
This avoids a race condition where already dead death dependencies weren't causing kubexit to exit.
- Switch to json log formating - Add log timestamps - Add container_name to all log messsages - Add support for Stackdriver severity field - Pass the logger (with "global" fields) through the context
Any progress on this PR? |
Sorry, man. I got a test added and failing for the reasons you specified. I need to either adopt your change or figure out an alternative. But I’m going on vacation till next week, when I’m starting a new job. So you probably are gonna need to run your own patch for a while. Hopefully I can get back to it in a week or so. Thanks so much for your feedback and PRs! |
Just wanted to say that we've faced same issue. How it looks to me:
Should we make some kind of wait-for loop for dependent processes to ensure primary tombstone is there? |
This avoids a race condition where already dead death dependencies weren't causing kubexit to exit.
Fixes #8
Replaces #10