Enhancing Workflow Execution Models in ELSA 3 - Transactional vs. Distributed Modes #5225

sfmskywalker · 2024-04-15T12:06:53Z

sfmskywalker
Apr 15, 2024
Maintainer

We're exploring potential enhancements to Elsa Workflows and would love your input! As Elsa continues to evolve, one of our goals is to balance robust, scalable workflow execution with the flexibility needed to handle various use cases, including those requiring immediate HTTP context interactions.

Background

Elsa supports workflows that can start or resume based on various triggers (like HTTP requests, timers, etc.). These workflows operate within a transactional model where context-specific actions are necessary (e.g., directly sending HTTP responses). However, to improve scalability and fault tolerance, we are integrating the Actor Model pattern using Proto.Actor, transitioning towards a more distributed, actor-based execution model.

The Challenge

Integrating an actor model raises a question about context-specific executions, particularly those that need to interact directly with the HTTP context. The actor model inherently abstracts execution away from specific contexts, posing a challenge for workflows that need to respond directly within their initiating context.

Proposed Solution – Two Execution Modes

Transactional Mode:
- For workflows requiring immediate context interaction (e.g., HTTP responses).
- Executed directly within the initiating context (like an HTTP request).
Distributed Mode:
- For workflows that benefit from being managed by an actor model, suitable for long-running and complex workflows.
- Utilizes the Actor Model to enhance scalability and manage state efficiently.

Mixed Mode Scenarios

Some workflows might start in a transactional mode and then need to transition to a distributed mode. We’re considering mechanisms to dynamically switch execution models within a single workflow's lifecycle.

We Need Your Feedback

Your Experiences: How do you currently handle similar scenarios in your workflows?
Proposed Models: Thoughts on the transactional and distributed modes proposed above?
Transitioning Between Modes: Ideas or concerns about workflows that switch between modes?
Other Considerations: Any potential issues or alternative solutions you foresee with this approach?

Your insights are invaluable to us. Please share your thoughts and experiences, or any additional considerations that could help us refine these models.

2024-04-15T12:07:32Z

glime-ai[bot]
bot Apr 15, 2024

To address the challenge of integrating an actor model while maintaining the ability to interact with specific contexts like HTTP, the proposed solution of having two distinct execution modes seems like a practical approach. Here are some thoughts and considerations based on the information provided:

Transactional Mode:
- This mode is essential for workflows that need to perform immediate actions within the same context they were triggered, such as sending an HTTP response right after an HTTP request.
- Ensuring that this mode can operate with low latency and high reliability will be crucial since it directly affects the user experience.
Distributed Mode:
- The distributed mode is ideal for workflows that are more complex and do not require immediate context interaction. The use of an actor model like Proto.Actor can significantly improve scalability and fault tolerance.
- It's important to consider how state is managed and persisted across actors, especially in failure scenarios.
Mixed Mode Scenarios:
- The ability to start a workflow in transactional mode and then transition to distributed mode is a powerful feature. It allows for immediate response to events while still benefiting from the scalability of the actor model for the remainder of the workflow.
- One challenge here is ensuring a seamless transition between modes without losing context or state. This might require a well-defined state management and checkpointing system.
Feedback and Experiences:
- It would be beneficial to gather feedback from users who have workflows that require immediate HTTP context interactions to understand their needs and pain points better.
- Real-world use cases and scenarios can provide valuable insights into how these two modes can be optimized.
Transitioning Between Modes:
- Consideration should be given to how the system will handle transactions and rollbacks, especially when a workflow transitions from transactional to distributed mode.
- There should be clear documentation and guidelines on how to design workflows that might switch between modes to avoid confusion and potential errors.
Other Considerations:
- It's important to consider the security implications of both modes, especially when dealing with sensitive data that might be passed between different contexts or actors.
- Monitoring and logging will be crucial, especially in a distributed environment, to track the health and performance of workflows and to troubleshoot issues.

Overall, the proposed solution seems to be on the right track. However, it will be important to consider the complexities of state management, transitions between modes, and the need for robust monitoring and security measures. Additional feedback from the community and further testing will be essential to refine these models and ensure they meet the diverse needs of users.

This comment was generated by Glime.

0 replies

RenatoCapelo · 2024-09-13T16:34:13Z

RenatoCapelo
Sep 13, 2024

In our case, we’re employing a workaround that might align with your proposal. For workflows triggered by HTTP requests, particularly those involving slower activities, we handle the scenario by returning an HTTP 202 response once inputs are validated. This response includes an ID, which the client can use to track the workflow’s progress. To ensure the HTTP response is properly sent before further processing, we force a "flush" with a short delay (typically one second).

This approach enables the workflow to decouple from the HTTP context quickly, allowing us to manage longer-running activities asynchronously. The client can periodically check the workflow status using the provided ID through a separate retrieval activity, which monitors whether the required data or process is complete.

We’re currently using this in a single-instance setup, but we're exploring the possibility of transitioning to a distributed runtime. One key feature for us is the ability to bind specific activities to specific machines or environments. For example, we have on-premise machines handling file processing tasks due to their proximity to data, while most of our instances run in the cloud. This functionality is crucial, as it allows us to take advantage of specialized hardware and meet compliance requirements while leveraging the scalability of cloud-based infrastructure.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancing Workflow Execution Models in ELSA 3 - Transactional vs. Distributed Modes #5225

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Enhancing Workflow Execution Models in ELSA 3 - Transactional vs. Distributed Modes #5225

sfmskywalker Apr 15, 2024 Maintainer

Background

The Challenge

Proposed Solution – Two Execution Modes

Mixed Mode Scenarios

We Need Your Feedback

Replies: 2 comments

glime-ai[bot] bot Apr 15, 2024

RenatoCapelo Sep 13, 2024

sfmskywalker
Apr 15, 2024
Maintainer

glime-ai[bot]
bot Apr 15, 2024

RenatoCapelo
Sep 13, 2024