[FEATURE] Add option to provision synchronously #967

dbwiddis · 2024-11-19T18:39:55Z

Is your feature request related to a problem?

Presently, when provisioning a workflow (via either the provision API, create API with provision or param, the REST call returns immediately with a 200 (OK) response, but the caller must then poll the Workflow Status API to monitor the status of provisioning.

This asynchronous execution of provisioning was intentional to provide the ability for a front end to obtain status throughout provisioning, possibly including a progress bar or similar, and because some provisioning processes take longer than the expected time for a REST response.

However, there are some use cases where the user may be willing to wait for a completed response, and not have to poll. This would be particularly useful in cases similar to the ML Commons Remote Model deployment which provides such a synchronous API.

What solution would you like?

Add optional parameters to the create and provision work flow APIs to wait for the request to complete, with a timeout. Other OpenSearchAPIs use wait_for_completion and wait_for_completion_timeout so I'd suggest these names.

Alter the Provision Workflow Transport action, when this parameter is present, to wait to return until provisioning is complete (or the timeout).

What alternatives have you considered?

A separate wrapper API that does the retries internally.

Do you have any additional context?

This would be a much simpler approach for automation tools, that would not require them to code all the polling themselves.

The text was updated successfully, but these errors were encountered:

arjunkumargiri · 2024-11-20T22:45:09Z

Thanks @dbwiddis, this approach will help simplify provisioning/automation of opensearch resources with minimal client side code. Few follow up questions:

What is the default timeout config? Will workflow be terminated if the timeout is breached?
Will resources be in partial provisioned status in case of a timeout/failure?
Will the list of provisioned resources be included as part of provision API in case of wait_for_completion?

dbwiddis · 2024-11-20T22:59:24Z

What is the default timeout config? Will workflow be terminated if the timeout is breached?

Probably the standard OpenSearch default timeout for Rest Requests.

We can handle timeout any way we want: cancelling the futures of a workflow in progress will probably suffice. Note that some workflow steps in progress may continue even after a cancellation but the overall workflow would stop executing.

Will resources be in partial provisioned status in case of a timeout/failure?

Yes.

Will the list of provisioned resources be included as part of provision API in case of wait_for_completion?

Sounds reasonable to provide the same return value as workflow status API.

arjunkumargiri · 2024-11-20T23:09:30Z

Can we rollback partially provisioned resources in case of failure?

dbwiddis · 2024-11-20T23:36:06Z

Can we rollback partially provisioned resources in case of failure?

The deprovision API will do that.

We have not yet added an auto-rollback capability, which would be equally appropriate for a failed async provision.

Also, regarding cancellation, if we tried an immediate rollback it may not catch all the in-progress resources. For example, say we registering and deploying a local model and then creating an agent. Assume registering completes successfully but the deploy step times out because it's a very large model. Registration would create the model resource. Upon failure (the timeout), all the futures would be cancelled, meaning the agent would never run. However, the model deployment would eventually probably complete. If we tried to deprovision immediately we'd only see the registered model. (I'm not sure what happens if we try to delete a model which is in the process of deploying?) If we wait for the step to complete we might have it deployed. In that case you'd have both the register and deploy "resources" and you could successfully deprovision with an undeploy/delete.

This is just one simple example, it can get more complex. Which is why we haven't gotten to it yet.

dbwiddis added enhancement New feature or request untriaged labels Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add option to provision synchronously #967

[FEATURE] Add option to provision synchronously #967

dbwiddis commented Nov 19, 2024

arjunkumargiri commented Nov 20, 2024

dbwiddis commented Nov 20, 2024

arjunkumargiri commented Nov 20, 2024

dbwiddis commented Nov 20, 2024

[FEATURE] Add option to provision synchronously #967

[FEATURE] Add option to provision synchronously #967

Comments

dbwiddis commented Nov 19, 2024

Is your feature request related to a problem?

What solution would you like?

What alternatives have you considered?

Do you have any additional context?

arjunkumargiri commented Nov 20, 2024

dbwiddis commented Nov 20, 2024

arjunkumargiri commented Nov 20, 2024

dbwiddis commented Nov 20, 2024