Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure infeasible PVC modifications are retried at slower pace #453

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

AndrewSirenko
Copy link
Contributor

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:

This PR ensures infeasible PVC modifications (E.g. InvalidArgument) are retried at slower pace than normal failures (they are retried at the sidecar's max-interval-retry).

This prevents spamming relevant PVC with events when future ModifyVolume RPC Triggers will likely fail, like when a user's VolumeAttributeClass has incorrect parameters.

See related resizer controller PR #418.

Which issue(s) this PR fixes:

Fixes #407

Special notes for your reviewer:

This PR is stacked on top of the 2 commits of my unit test PR, #447. That should probably be reviewed first.

This PR was tested alongside AWS EBS CSI Driver (ensuring that only 1 ModifyVolume RPC was triggered every max-interval-retry, if RPC failed with an infeasible error code). Ideally we would add csi mock e2e tests in k/k, but those do not exist yet for ModifyVolume.

Does this PR introduce a user-facing change?:

Infeasible PVC modifications will be retried at a slower pace than normal failures.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 13, 2024
@AndrewSirenko AndrewSirenko changed the title Modify infeasible slow set Ensure infeasible PVC modifications are retried at slower pace Nov 13, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 13, 2024
Comment on lines -50 to -52
if *curVacName == targetVacName {
// if somehow both curVacName and targetVacName is same, what does this mean??
// I am not sure about this.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this branch initially included? Will bring up in next SIG Implementation meeting, because this week's iteration was skipped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If curVacName == targetVacName is true, then does the volume needs modification at all? cc @sunnylovestiramisu

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 3, 2024
@gnufied
Copy link
Contributor

gnufied commented Jan 7, 2025

@sunnylovestiramisu can you also review this PR? I was hoping we can merge this PR before we release the next minor version of external-resizer.

@sunnylovestiramisu sunnylovestiramisu self-assigned this Jan 7, 2025
@AndrewSirenko AndrewSirenko force-pushed the modifyInfeasibleSlowSet branch from da332d6 to e3e2148 Compare January 7, 2025 21:52
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: AndrewSirenko
Once this PR has been reviewed and has the lgtm label, please assign msau42 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 7, 2025
func (ctrl *modifyController) delayModificationIfRecentlyInfeasible(pvc *v1.PersistentVolumeClaim, pvcKey string) error {
// Do not delay modification if PVC updated with new VAC
s := pvc.Status.ModifyVolumeStatus
if s == nil || pvc.Spec.VolumeAttributesClassName == nil || s.TargetVolumeAttributesClassName != *pvc.Spec.VolumeAttributesClassName {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnufied @sunnylovestiramisu I should also ensure slowset key is deleted if targetVac != pvc.spec.vac right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, we try at slow interval for Infeasible after the ControllerModifyVolume call failed. On the other hand, is there any possibility to end up in the state of targetVac != pvc.spec.vac and Infeasible?

Copy link
Contributor Author

@AndrewSirenko AndrewSirenko Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any possibility to end up in the state of targetVac != pvc.spec.vac and Infeasible?

I see a world where the following could happen:

  1. Patch PVC with invalid VAC
  • pvc.Status.TargetVacName set and ModifyVolumeStatus marked InProgress
  • ModifyVolume RPC called, Infeasible error code
  • ModifyController marks PVC ModifyVolumeStatus infeasible
  • Future ModifyVolume RPCs called at slow interval
  1. Patch PVC with valid valid VAC
  • VAC ModifyVolumeStatus infeasible has not been cleared yet (markControllerModifyVolumeStatus not yet called)
  • pvc.Status.TargetVacName still refers to invalidVac
  • PVCKey in slowset

Therefore targetVac != pvc.spec.vac and Infeasible.

ModifyVolume RPC would not be triggered until slowset entry expires. But this if-statement here would prevent that.

@AndrewSirenko AndrewSirenko force-pushed the modifyInfeasibleSlowSet branch from e3e2148 to 810bc51 Compare January 7, 2025 22:15
@AndrewSirenko
Copy link
Contributor Author

AndrewSirenko commented Jan 7, 2025

In addition to unit tests, tested the following with EBS CSI Driver:

  • Create pvc without VAC, modify to invalid VAC:
    • resizer will retry every retry-interval-max seconds instead of at normal backoff
    • If you update PVC with new vac, we will retry instantly
  • Create PVC with valid VAC, modify to invalid vac:
    • Same cases as above

These tests should probably be added to k/k via csimock driver, once we have mock csi driver VAC tests.

@AndrewSirenko AndrewSirenko force-pushed the modifyInfeasibleSlowSet branch from 810bc51 to be155d3 Compare January 7, 2025 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make sure infeasible PVC modifications are retried at slower pace than normal failures.
5 participants