Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Handling of percentUnrepairedThreshold in K8ssandra Operator when set to 0 #1113

Open
dnugmanov opened this issue Nov 13, 2023 · 5 comments
Labels
assess Issues in the state 'assess' bug Something isn't working

Comments

@dnugmanov
Copy link
Contributor

dnugmanov commented Nov 13, 2023

What happened?

When configuring the Reaper autoScheduling with percentUnrepairedThreshold: 0, the K8ssandra Operator fails to honor this value and automatically reverts it to the default of 10. This behavior appears to be linked to the use of int in the structure, rather than *int, causing a lack of distinction when the value is set to 0 or not set.

Did you expect to see something different?

The percentUnrepairedThreshold should be respected and set to 0, as configured.

How to reproduce it (as minimally and precisely as possible):

reaper:
  autoScheduling:
    percentUnrepairedThreshold: 0

Environment

  • K8ssandra Operator version:

    v1.10.2

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: K8OP-63

@dnugmanov dnugmanov added the bug Something isn't working label Nov 13, 2023
@adejanovski
Copy link
Contributor

Hi @dnugmanov, may I ask which behavior you're looking for by setting this to 0?
Basically it would mean that incremental repair will run again as soon as it finishes, which could be accomplished with a standard schedule by setting the interval to 0.
I think a value of 5 or 10 makes more sense for the percentUnrepairedThreshold, and while it may vary depending on the requirements, 0 doesn't seem like a proper value to use.

@adejanovski adejanovski moved this to Assess/Investigate in K8ssandra Nov 13, 2023
@adejanovski adejanovski self-assigned this Nov 13, 2023
@adejanovski adejanovski added the assess Issues in the state 'assess' label Nov 13, 2023
@dnugmanov
Copy link
Contributor Author

@adejanovski
Hi, I would like to deactivate the percentUnrepairedThreshold parameter and execute all repairs based on the specified Interval in days. The root cause is that the percentUnrepairedThreshold can delay the next scheduled task to the following year, causing unexpected delays.

From the screenshot: the next scheduled run is set "in 7 months" for image_mri and for "in a year" for reaper_db.

image

@adejanovski
Copy link
Contributor

what's weird here is that you have 7 days or 10% unrepaired as interval. So the next run should be at most 7 days after the previous run 🤔
Let me try to reproduce this.

@dnugmanov
Copy link
Contributor Author

Yes, we have identified two issues:

  • A bug in Reaper that leads to incorrect scheduling when using percentUnrepairedThreshold (unfortunately, i dont' know how to reproduce it).
  • A bug in K8ssandra Operator that prevents the proper disabling of percentUnrepairedThreshold, for mitigating the issue caused by the Reaper bug.

@dnugmanov
Copy link
Contributor Author

@adejanovski Hi, have you reproduced the Reaper bug? I have one cluster experiencing that issue, and I can help you collect diagnostic information.
What should we do next according to the current ticket? Should we create a Merge Request (MR) to fix the percentUnrepairedThreshold, or should we close the ticket and open a new one for the Reaper?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assess Issues in the state 'assess' bug Something isn't working
Projects
No open projects
Status: Assess/Investigate
Development

No branches or pull requests

2 participants