Am I using ContinueWhilePossible / maxRetries correctly? #7163
-
I have multiple tasks running under one workflow. These tasks use scatter to run many jobs. Doing so creates a bunch of shards. My question is, if one shard fails, will the other shards and tasks continue to run as long as they don't need inputs from the failed shard? So, for instance, if shard-0 on a task fails and I have 50 concurrent shards running all at once, will shard-51 run? Or does it only continue the currently running shards? I assume shard-0 won't have any downstream tasks run, but I do want the rest of the successful shards to have their downstream tasks still run. I'm trying to use ContinueWhilePossible and maxRetries to re-run certain tasks in the hopes that we get more complete results, but even if one shard fails after the max number of retries, I want the rest of the shards to continue running and have the workflow run to completion as much as possible. Edit: I am testing this using various inputs, but it isn't super-easy to get the kind of failures I'm looking for as they tend to be random |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
What you are describing is the expected behavior of |
Beta Was this translation helpful? Give feedback.
What you are describing is the expected behavior of
ContinueWhilePossible
.