Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing multiple resources linked with Usages lead to failure in test even though deletion succeeds #33

Open
kaessert opened this issue Oct 10, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@kaessert
Copy link
Contributor

kaessert commented Oct 10, 2024

What happened?

When testing combinations of resources that depend on each other with Usages, the delete step fails even though the cleanup eventually succeeds. The test contained following resources:

  • XAWSLBController
  • XEKS
  • XNetwork

XAWSLBController contains a helm chart and a Usage of XEKS by a helm Release.

Running uptest against such a configuration leads to the following situation:

  • We delete XAWSLBController first which succeeds immediately
  • Afterwards we delete XEKS and get en error because cleanup in the background of XAWSLBController didn't finish yet, the Usage is still around

This is the error we're seeing:

"nousages.apiextensions.crossplane.io" denied the request: This resource is in-use by 1 Usage(s), including the Usage "configuration-aws-lb-controller-7d62z" by resource Release/configuration-aws-lb-controller-xmlng.

Working around this as of today is possible with a pre-delete or post-delete hook but we feel that it's not a great approach as it introduces additional hurdles for people trying uptest and it's leaking orchestration details which are already handled inside the core of crossplane.

Things we tried and discussed:

  • omit "wait: true" from the delete statement. This has the same effect.
  • run delete steps with || true. This is possible and cleanup will succeed but we will swallow all errors regardless if it's connected to usages or not
  • running in a loop and catch exit-code and stderr, then compare with stderr on failure with "nousages.apiextensions.crossplane.io" denied the request. Can work but can cause trouble with additional pre-delete and similar hooks as they now would require to be idempotent or we need to catch additional errors.
  • finding all usages via owner-references connected to the current xr to delete and issue kubectl wait to wait for usages to be cleaned up before proceeding. Can also work but leaps deeply into internals.

Another thought: It might be that we're trying to work around a behavior which is actively not supported in chainsaw and this is why proposed solutions look kinda ugly. Maybe it's worthwhile creating a PR on chainsaw introducing something like retry paramter for a script step. At the very least we'll get some feedback from maintainers how they envision such a flow to work because trying to delete an object temporarily protected by an admission webhook might be not a standard usecase but i can imagine other setups can hit the same wall without being specific to crossplane.

Sidenote: This only occurs when the objects in question are NOT part of the same composition, as in this case deletion-errors are not visible to the outside and crossplane handles the cleanup flawlessly.

How can we reproduce it?

Running uptest on this changeset without including the post-delete hook: upbound/configuration-aws-lb-controller#1

What environment did it happen in?

  • Uptest Version: v1.1.2
@kaessert kaessert added the bug Something isn't working label Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant