You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently recursive delete operations like org, space, app delete, which implicitly delete service bindings and/or service instances, will fail if one of the service related deletions are handled asynchronously by the service broker. This is not optimal as users have to trigger the deletion of the parent resource again. Also currently users will get the following error message, which does not really reveal what is going on: An operation for the service binding between app myapp and service instance myinstance is in progress.
Context
[provide more detailed introduction]
Steps to Reproduce
For example delete an app, which is bound to a service instance. The broker should answer unbinding requests with a 202.
Current result
The following tables describe the resulting behaviour for different resources and responses by the broker. In general all the recursive deletions fail if a sub-resource gets deleted asynchronously. The behaviour is the same for service instances and service bindings.
Delete service instance when service binding present
Starts polling service binding last operation and sets service instance delete job and service instance last operation to failed with message "delete could not be completed: An operation for the service binding between app myapp and service instance myinstance is in progress."
500
Both the service instance and service binding delete operations fail
200
Binding will be gone immediately, then a delete service instance request will be sent to the broker and either its gone too or polling starts
Starts polling service binding last operation and sets app delete job to failed with message "Job (3d1d051f-6c94-47ab-85e2-dec27e0db75a) failed: An operation for the service binding between app myapp and service instance myinstance is in progress."
500
The app delete job fails and the service binding operations state is set to failed. Error message: "Job (17662f61-4111-4712-91da-d3ab11629ba7) failed: Service broker failed to delete service binding for instance myinstance: The service broker returned an invalid response. Status Code: 500 Internal Server Error, Body: {"state":"in progress"}"
200
Service binding gets deleted, app gets deleted and app delete job set to COMPLETE
Starts polling service binding last operation and sets space delete job to failed as well as service instance last operation with message "Job (5b91523b-fe11-4cdb-bbc9-063b65fa8dee) failed: Deletion of space myspace failed because one or more resources within could not be deleted. An operation for the service binding between app myapp and service instance myinstance is in progress."
500
The space delete job fails and the service binding and service instance last operations state is set to failed. Error message: "Job (ebce0407-9560-44fb-9b39-7b06f35edb4f) failed: Deletion of space d071102 failed because one or more resources within could not be deleted. Service broker failed to delete service binding for instance myinstance: The service broker returned an invalid response. Status Code: 500 Internal Server Error, Body: {"state":"in progress"}"
200
Service binding, service instance, app and space gets deleted, and space delete job set to COMPLETE
Delete org, which contains a space which contains a service binding
Starts polling service binding last operation and sets org delete job to failed as well as service instance last operation with message "Job (0daca787-a8b5-4433-967c-3b0c8d2e1798) failed: Deletion of organization d071102 failed because one or more resources within could not be deleted. Deletion of space d071102 failed because one or more resources within could not be deleted. An operation for the service binding between app myapp and service instance myinstance is in progress."
500
The org delete job fails and the service binding and service instance last operations state is set to failed. Error message: "Job (5c7ffdc1-7bb3-4a95-96da-33d19c5d4e79) failed: Deletion of organization d071102 failed because one or more resources within could not be deleted. Deletion of space d071102 failed because one or more resources within could not be deleted. Service broker failed to delete service binding for instance myinstance: The service broker returned an invalid response. Status Code: 500 Internal Server Error, Body: {"state":"in progress"}"
200
Everything gets deleted and organization delete job set to COMPLETE
Further findings
All recursive deletions will trigger the deletion of all sub-resources (except they depend on each other). For example an app delete will trigger the deletion of all service bindings of that app. If one binding fails to delete or is being deleted asynchronously, the job will continue to trigger the deletion of all other bindings. Service instances, which have bindings which are in deletion won't be deleted.
Expected result
Best case would be if the recursive delete operations can handle asynchronously deleted sub-resources. See next section for some ideas on how to achieve this.
Possible Fix
Re-enqueue recursive jobs instead of setting them to failed
The deletion jobs could be re-enqueued similarly to what we do for the polling mechanism of service related operations. The job could then check whether the resources have been deleted successfully and if so delete the parent resource.
Some thoughts on this:
Probably we would need a "locking mechanism" to prevent that in an org or space etc., which are being deleted, new resources are being created.
If asynchronous deletions fail, the job should remember that it tried to delete this resource already, otherwise this might be an endless loop.
A parameter, which allows configuring a maximum timeout for such jobs would be good
When the job fails because sub-resources could not be deleted, it would be good to show the original error message, why the deletion failed.
Delete parent resource immediately and continue asynchronous deletion of sub-resources in the background
If a service broker responds with a 202 for an unbind or deprovision request we can assume that the broker will take care of the deletion and at least "delete" it from the user perspective. The CC could then continue polling the last operation state from the broker. If the deletion fails, orphan mitigation could take over.
Some thoughts on this:
In the worst case the deletion of the service binding times out after the max_poll_intervall. How to proceed with the service instance then?
What if a user wants to create resources with the same names again after the CC stated they have been deleted, but in reality the deletion is still going on in the background?
Issue
Currently recursive delete operations like org, space, app delete, which implicitly delete service bindings and/or service instances, will fail if one of the service related deletions are handled asynchronously by the service broker. This is not optimal as users have to trigger the deletion of the parent resource again. Also currently users will get the following error message, which does not really reveal what is going on:
An operation for the service binding between app myapp and service instance myinstance is in progress.
Context
[provide more detailed introduction]
Steps to Reproduce
For example delete an app, which is bound to a service instance. The broker should answer unbinding requests with a 202.
Current result
The following tables describe the resulting behaviour for different resources and responses by the broker. In general all the recursive deletions fail if a sub-resource gets deleted asynchronously. The behaviour is the same for service instances and service bindings.
Delete service instance when service binding present
Delete app when bound to service
Delete space which contains a service binding
Delete org, which contains a space which contains a service binding
Further findings
Expected result
Best case would be if the recursive delete operations can handle asynchronously deleted sub-resources. See next section for some ideas on how to achieve this.
Possible Fix
Re-enqueue recursive jobs instead of setting them to failed
The deletion jobs could be re-enqueued similarly to what we do for the polling mechanism of service related operations. The job could then check whether the resources have been deleted successfully and if so delete the parent resource.
Some thoughts on this:
Delete parent resource immediately and continue asynchronous deletion of sub-resources in the background
If a service broker responds with a 202 for an unbind or deprovision request we can assume that the broker will take care of the deletion and at least "delete" it from the user perspective. The CC could then continue polling the last operation state from the broker. If the deletion fails, orphan mitigation could take over.
Some thoughts on this:
max_poll_intervall
. How to proceed with the service instance then?Related issues
/v3/app
delete does not wait until service binding are unbound #3333 - Describes the behaviour for app delete alreadyThe text was updated successfully, but these errors were encountered: