Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger now if workflow paused #6499

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented Nov 28, 2024

Close #5963
Supesede #6192 (it was getting messy)


New behaviour for triggering when the workflow is paused

DEFAULT: run triggered tasks NOW even if the workflow is paused

  • supports the common use case "I want to pause the workflow and just trigger individual tasks"
  • (NIWA priority raised by both ops and research users in early Cylc 8 migration meetings)

OPTION: --on-resume - run triggered tasks only when the paused workflow is resumed

  • this supports the broadcast-to-past-tasks workaround until broadcast is fixed (prob. 8.5.0)
  • the option can be removed in the future, once broadcast is fixed, if there are no other use cases

Implementation:

  • separate start of job submission from queue release, in the scheduler
  • manual trigger adds tasks to tasks_to_trigger_(now|on_resume) lists, after state reset and queue wrangling
  • if these lists are populated, the scheduler will release tasks from them as appropriate
  • (queues are not processed if the scheduler is paused)

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at Update docs for trigger-when-paused. cylc-doc#778
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@hjoliver
Copy link
Member Author

hjoliver commented Nov 28, 2024

Status of this PR:

  • I can easily remove the --on-resume option, if we have a viable alternative for the broadcast problem at 8.4.0
  • if we keep the option, I just need to extend the tests to cover it

@hjoliver hjoliver changed the title Trigger --now for workflow pause and queues. Trigger now if workflow paused Nov 28, 2024
@hjoliver hjoliver force-pushed the trigger-when-paused_2 branch 2 times, most recently from f3587d1 to 55c71a9 Compare November 29, 2024 01:05
@dwsutherland
Copy link
Member

dwsutherland commented Dec 2, 2024

(queues are not processed if the scheduler is paused)

Does this mean if I trigger when paused, and a queue limit is reached, that it will enter a queue but not be released when the queue "empties" of running tasks?

Will test.

@hjoliver
Copy link
Member Author

hjoliver commented Dec 2, 2024

(queues are not processed if the scheduler is paused)

Does this mean if I trigger when pause, and a queue limit is reached, that it will enter a queue but not be released when the queue "empties" of running tasks?

Yes. Pausing the scheduler halts all automatic job submission, so that means queued tasks won't automatically run. But if you want the queued task to run before resuming the workflow you can just trigger it again before resuming.

@hjoliver
Copy link
Member Author

hjoliver commented Dec 2, 2024

(I've just tweaked the pause and trigger command help in an attempt to make that clearer @dwsutherland )

Copy link
Member

@dwsutherland dwsutherland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, reacts as expect with queued/unqueued 👍

    [[queues]]
        [[[THIS]]]
            limit = 1
            members = baz, zip, toz
    [[graph]]
        R1 = "prep => foo"
        PT12H = "foo[-PT12H] => foo => baz & zip & toz => bar"

image

cylc/flow/network/schema.py Outdated Show resolved Hide resolved
cylc/flow/network/schema.py Outdated Show resolved Hide resolved
cylc/flow/network/schema.py Outdated Show resolved Hide resolved
changes.d/6499.feat.md Outdated Show resolved Hide resolved
cylc/flow/scripts/pause.py Outdated Show resolved Hide resolved
cylc/flow/scripts/pause.py Outdated Show resolved Hide resolved
cylc/flow/scripts/trigger.py Outdated Show resolved Hide resolved
cylc/flow/scripts/trigger.py Outdated Show resolved Hide resolved
cylc/flow/scripts/trigger.py Outdated Show resolved Hide resolved
cylc/flow/task_queues/independent.py Outdated Show resolved Hide resolved
cylc/flow/task_queues/independent.py Outdated Show resolved Hide resolved
@oliver-sanders
Copy link
Member

@hjoliver, by bypassing queue processing for manual trigger, this PR may close #6398

cylc/flow/scripts/pause.py Outdated Show resolved Hide resolved
cylc/flow/scripts/trigger.py Outdated Show resolved Hide resolved
tests/integration/conftest.py Show resolved Hide resolved
@MetRonnie MetRonnie added the schema change Change to the Cylc GraphQL schema label Dec 3, 2024
Comment on lines 167 to +173

# set --pre=all should not cause a task to submit in a paused workflow
assert len(submitted_tasks) == 0
# but triggering it should
schd.pool.force_trigger_tasks(['1/a'], [])
schd.release_tasks_to_run()
assert len(submitted_tasks) == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you put this new bit in a separate test and revert the splitting of trigger and set from this test; a key feature of this test was that is checks equity between trigger and set for the same flow nums

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I split that into separate tests is that this PR changes that equivalence.

Trigger and set (in a paused workflow) are more similar on master, because a triggered task does not become "active" until the workflow is resumed, so it can be triggered multiple times to absorb multiple flow numbers. On this branch, a triggered task becomes active immediately and so subsequent triggers get ignored.

In light of that, do you still want me to revert that? I suppose we could have a combined test for what is common, and separate tests for the behavior that differs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of that, do you still want me to revert that? I suppose we could have a combined test for what is common, and separate tests for the behavior that differs.

👆 That

@hjoliver
Copy link
Member Author

hjoliver commented Dec 3, 2024

@oliver-sanders -

@hjoliver, by bypassing queue processing for manual trigger, this PR may close #6398

That issue is confusing. I think the "bug" initially reported by Tom is just the expected double-trigger requirement to get past a queue. In his description, despite the tasks being held, they did run on the second trigger attempt - but he had expected that manual trigger would skip queues to run immediately after just one attempt.

So I presume you're referring to the presumably-actual-bug that you commented on part way through the discussion: #6398 (comment)

... I've commented there, as I'm not sure this PR is relevant: #6398 (comment)

@oliver-sanders
Copy link
Member

oliver-sanders commented Dec 4, 2024

I think the "bug" initially reported by Tom is just the expected double-trigger requirement to get past a queue.

No, there is definitely a bug lurking here.

I have added an example to the issue in which task doesn't run when triggered (no queue limiting in effect), however, on this branch the task does run (entering an infinite kill/trigger loop).

So it looks like this closes #6398 (congrats on fixing a bug you didn't think existed!).

@MetRonnie
Copy link
Member

Test added on master failing - need to merge master into this branch

@@ -1703,7 +1740,7 @@ async def _main_loop(self) -> None:
self.broadcast_mgr.check_ext_triggers(
itask, self.ext_trigger_queue)

if itask.is_ready_to_run():
if itask.is_ready_to_run() and not itask.is_manual_submit:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this change?

Copy link
Member

@oliver-sanders oliver-sanders Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because manual submits may now bypass the queue, they get pushed through the submission pipeline (lines 1440-1450 above).

@hjoliver
Copy link
Member Author

hjoliver commented Dec 5, 2024

(I ran out of time today, but I just need to tweak the tests as requested by @MetRonnie ... )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
schema change Change to the Cylc GraphQL schema
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow trigger-now in a paused workflow?
4 participants