Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix red herring in the readinessProbe #1504

Merged
merged 2 commits into from
Mar 26, 2024
Merged

fix red herring in the readinessProbe #1504

merged 2 commits into from
Mar 26, 2024

Conversation

nammn
Copy link
Collaborator

@nammn nammn commented Mar 7, 2024

All Submissions:

  • Have you opened an Issue before filing this PR?
  • Have you signed our CLA?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Put closes #XXXX in your comment to auto-close the issue that your PR fixes (if such).

Summary

  • fixes the red herring that the agent-health-status file is missing
  • continue of fix: remove panic of missing health status.json #1224, we missed a few places. This PR instead suggests to test for the file as early as possible in the code
  • it can be missing, since the agent requires some time until it populates that file
  • while it is still populating the probe should just return unready and it will eventually work

@nammn nammn changed the title fix red herring fix red herring in the readinessProbe Mar 7, 2024
// In that case, we don't want to panic to show the message
// in the kubernetes description. That would be a red herring, since that will solve itself with enough time.
if err != nil {
os.Exit(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we log something here to indicate what happened?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the same, but then I would need to move the logger initialization out to fit the logger:

logger := &lumberjack.Logger{
Filename: readinessProbeLogFilePath(),
MaxBackups: readIntOrDefault(readinessProbeLoggerBackups, 5),
MaxSize: readInt(readinessProbeLoggerMaxSize),
MaxAge: readInt(readinessProbeLoggerMaxAge),
}

I could do this 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about panicing with an error? This way it should be visible in the kubectl get events and kubectl describe pod.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh so the root cause of most red herrings are that exact error/panic.
Note the comment:

// In that case, we don't want to panic to show the message
// in the kubernetes description. That would be a red herring, since that will solve itself with enough time

That error almost always happens because we are calling the probe before the agent was able to create the health status file.

Regarding adding the log, I am not sure whether this effort is worth to log it 🤔
@lsierant @slaskawi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @nammn ! I believe it's still worth logging an info why this happened. Without it, investigating corner cases might be simply impossible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@nammn nammn requested a review from lsierant March 26, 2024 11:42
@nammn nammn merged commit a07365e into master Mar 26, 2024
44 checks passed
@nammn nammn deleted the fix-confusing-error-2 branch March 26, 2024 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants