Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convey failed maintenance by requestor #12

Merged
merged 1 commit into from
Aug 27, 2024

Conversation

adrianchiris
Copy link
Collaborator

@adrianchiris adrianchiris commented Aug 21, 2024

This commit adds support for requestors to convey failed
node maintenance.

  • Add RequestorFailed condition type, to be set by requestors
    with reason MaintenanceFailed if maintenance operation failed

  • Add new condition reason to Ready condition : MaintenanceFailed.
    this reason is set by controller if Ready condition is with Ready
    reason and RequestorFailed condition is preset with value True.

Failed condition type conveys failure in performing node
maintenance by the requestor.

ConditionTypeReady string = "Ready"
// ConditionTypeFailed is the Failed Condition type for NodeMaintenance.
// It is used to convey failure during node maintenance operation by the requestor.
ConditionTypeFailed string = "Failed"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use a more specific condition like "MaintenanceFailed". open to suggestions.
ATM chose the minimalistic approach.

later on we can consider adding the "Degraded" condition type to convey possible errors during Maintenance prep by the operator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name is fine, let's discuss the reason why we need a separate condition

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline.

it is desirable to have a separate condition so we dont have a mix of entities updating the same condition.
its is undesirable that one condition remains ready: true and the other Failed:true having two contradicting conditions at the same time.

thus we will

  1. keep the added condition which reflects the maintenance outcome by the requestor
  2. introduce a new condition reason to Ready condition : "RequestorFailedMaintenance". Ready condition will be false if this reason is set.
  3. controller will set above reason to Ready condition if "Failed" condition is detected.
  4. object will not be garbage collected if "RequestorFailedMaintenance" is set as Ready condition reason

@coveralls
Copy link

coveralls commented Aug 21, 2024

Pull Request Test Coverage Report for Build 10580978064

Details

  • 28 of 43 (65.12%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.4%) to 65.644%

Changes Missing Coverage Covered Lines Changed/Added Lines %
internal/controller/nodemaintenance_controller.go 28 43 65.12%
Totals Coverage Status
Change from base Build 10579943950: -0.4%
Covered Lines: 1091
Relevant Lines: 1662

💛 - Coveralls

@adrianchiris adrianchiris changed the title Add fatal condition Convey failed maintenance by requestor Aug 25, 2024
@adrianchiris
Copy link
Collaborator Author

@ykulazhenkov PTAL.

we now have:

  1. new condition of type "RequestorFailed" with a single reason: "MaintenanceFailed"
  2. new condition reason for "Ready" type : "RequestorFailed"

the presence of 1. with state "True" will update "Ready" condition to Reason "RequestorFailed" if previous reason was "Ready"
if 1 is not present or its state is "False" we return to "Ready" condition to Reason "Ready"

newRCR := k8sutils.GetReadyConditionReason(newO)
condChanged := false

oldConds := make(map[string]metav1.Condition)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why can't we sort the condition slice and then simply compare with reflect.DeepEqual?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea :) will do it ofc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ykulazhenkov PTAL, LMK if thats what you had in mind.

@ykulazhenkov
Copy link
Collaborator

@adrianchiris the PR looks good. I added one nit comment that you can ignore

@adrianchiris adrianchiris force-pushed the add-fatal-condition branch 2 times, most recently from 16b5693 to e55534c Compare August 27, 2024 13:58
Copy link
Collaborator

@ykulazhenkov ykulazhenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small comment and we can merge after rebase

internal/controller/nodemaintenance_controller.go Outdated Show resolved Hide resolved
This commit adds support for requestors to convey failed
node maintenance.

- Add RequestorFailed condition type, to be set by requestors
  with reason MaintenanceFailed if maintenance operation failed

- Add new condition reason to Ready condition : MaintenanceFailed.
  this reason is set by controller if Ready condition is with Ready
  reason and RequestorFailed condition is preset with value True.

Failed condition type conveys failure in performing node
maintenance by the requestor.

Signed-off-by: adrianc <[email protected]>
@adrianchiris adrianchiris merged commit 21756f9 into Mellanox:main Aug 27, 2024
8 checks passed
@adrianchiris adrianchiris deleted the add-fatal-condition branch January 9, 2025 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants