Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: do not return an error when deleting a non-existing image #4885

Closed
wants to merge 1 commit into from

Conversation

nixpanic
Copy link
Member

@nixpanic nixpanic commented Oct 3, 2024

Deleting an image that has been already remove, should not be reported
as a failure. This improves the idempotency of the rbdImage.Delete()
function, making it easier for callers to detect real errors.

Related issues

Found while working on #4502.


Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

Deleting an image that has been already remove, should not be reported
as a failure. This improves the idempotency of the `rbdImage.Delete()`
function, making it easier for callers to detect real errors.

Signed-off-by: Niels de Vos <[email protected]>
@mergify mergify bot added the component/rbd Issues related to RBD label Oct 3, 2024
@nixpanic nixpanic requested a review from a team October 3, 2024 11:22
@nixpanic nixpanic requested a review from a team October 3, 2024 13:34
@Rakshith-R
Copy link
Contributor

We need to not have this check in order to trigger

func (ri *rbdImage) ensureImageCleanup(ctx context.Context) error {

@nixpanic
Copy link
Member Author

nixpanic commented Oct 3, 2024

We need to not have this check in order to trigger

func (ri *rbdImage) ensureImageCleanup(ctx context.Context) error {

Sorry, I am not sure what you mean by that... ensureImageCleanup can cause a panic without those error checks?

@@ -694,7 +694,7 @@ func (ri *rbdImage) trashRemoveImage(ctx context.Context) error {
_, err = ta.AddTrashRemove(admin.NewImageSpec(ri.Pool, ri.RadosNamespace, ri.ImageID))

rbdCephMgrSupported := isCephMgrSupported(ctx, ri.ClusterID, err)
if rbdCephMgrSupported && err != nil {
if rbdCephMgrSupported && err != nil && !errors.Is(err, librbd.ErrNotFound) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nixpanic is AddTrashRemove Mrg API is also support this error message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it seemed so. I have been testing with the Ceph main branch, maybe things changed there?

@Rakshith-R
Copy link
Contributor

We need to not have this check in order to trigger

func (ri *rbdImage) ensureImageCleanup(ctx context.Context) error {

Sorry, I am not sure what you mean by that... ensureImageCleanup can cause a panic without those error checks?

If the image does not exist, then the flow should not have reached that point at all.
Image not being found would've lead to ensureImageCleanup in actual delete procedure.
I think there are multiple processes acting on same image that's causing this error/panic in your case ??

@nixpanic nixpanic requested a review from Madhu-1 October 4, 2024 11:43
@nixpanic
Copy link
Member Author

nixpanic commented Oct 4, 2024

We need to not have this check in order to trigger

func (ri *rbdImage) ensureImageCleanup(ctx context.Context) error {

Sorry, I am not sure what you mean by that... ensureImageCleanup can cause a panic without those error checks?

If the image does not exist, then the flow should not have reached that point at all. Image not being found would've lead to ensureImageCleanup in actual delete procedure. I think there are multiple processes acting on same image that's causing this error/panic in your case ??

I don't think it multiple processes are trying to delete the same volume at the same time. However a 2nd DeleteVolume seems to come immediately after the 1st completed (kubernetes-csi/external-provisioner#1235 might be related?).

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Oct 7, 2024

We need to not have this check in order to trigger

func (ri *rbdImage) ensureImageCleanup(ctx context.Context) error {

Sorry, I am not sure what you mean by that... ensureImageCleanup can cause a panic without those error checks?

If the image does not exist, then the flow should not have reached that point at all. Image not being found would've lead to ensureImageCleanup in actual delete procedure. I think there are multiple processes acting on same image that's causing this error/panic in your case ??

I don't think it multiple processes are trying to delete the same volume at the same time. However a 2nd DeleteVolume seems to come immediately after the 1st completed (kubernetes-csi/external-provisioner#1235 might be related?).

we do have in memory lock to ensure that this should not happen.

@Rakshith-R any further comments on this one?

@nixpanic nixpanic requested a review from Rakshith-R October 10, 2024 15:19
@Rakshith-R
Copy link
Contributor

We need to not have this check in order to trigger

func (ri *rbdImage) ensureImageCleanup(ctx context.Context) error {

Sorry, I am not sure what you mean by that... ensureImageCleanup can cause a panic without those error checks?

If the image does not exist, then the flow should not have reached that point at all. Image not being found would've lead to ensureImageCleanup in actual delete procedure. I think there are multiple processes acting on same image that's causing this error/panic in your case ??

I don't think it multiple processes are trying to delete the same volume at the same time. However a 2nd DeleteVolume seems to come immediately after the 1st completed (kubernetes-csi/external-provisioner#1235 might be related?).

we do have in memory lock to ensure that this should not happen.

@Rakshith-R any further comments on this one?

Yea, I think We'll encounter this error only when another process puts the image to trash while this process reaches the rbd.Trash(0) or ra.addTrashRemove call. The only reason a process gets to that stage is when a rbd image is found.

We should treat such occurrences as a race condition happening somewhere in the code and fix that instead of the current error check to avoid it.

(sorry, missed this comment earlier)

@Rakshith-R
Copy link
Contributor

@nixpanic I think we can close this pr ?

@nixpanic
Copy link
Member Author

I have not seen this issue anymore, closing.

@nixpanic nixpanic closed this Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants