You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
it seems that the Raptor Master task scalems-rp-raptor.846fc04c-2b48-11ee-b1b6-8daf5ac26a8e should have received a message that told it to call self.stop() on itself from within a result_cb() . The Task carrying that message got marked DONE, but the Master task kept running for at least 20 seconds in state AGENT_EXECUTING. Later, it was successfully canceled with Task.cancel() . https://github.com/SCALE-MS/scale-ms/suites/14565990020/artifacts/824790031
It looks like master.stop() got called without an error and there is a log of the term getting set. Then the callback log message from the line after master.stop() logs its message. But the script doesn't record the log message from the line after the Master.join() (_raptor.join())
I'm going to leave that branch undisturbed for a while to give you a chance to look at it. I'm making some adjustments to the Master script in a different branch to move the Worker management out of the main script body. I'll let you know if I encounter something similar with a different script structure, but I'll also be interested to hear whatever you deduce.
The text was updated successfully, but these errors were encountered:
Automated test jobs for scalems started failing recently for the
devel
branch of RP.See, for instance,
https://github.com/SCALE-MS/scale-ms/actions/runs/5663049971/job/15569404817
it seems that the Raptor Master task scalems-rp-raptor.846fc04c-2b48-11ee-b1b6-8daf5ac26a8e should have received a message that told it to call self.stop() on itself from within a result_cb() . The Task carrying that message got marked DONE, but the Master task kept running for at least 20 seconds in state AGENT_EXECUTING. Later, it was successfully canceled with Task.cancel() . https://github.com/SCALE-MS/scale-ms/suites/14565990020/artifacts/824790031
It looks like master.stop() got called without an error and there is a log of the term getting set. Then the callback log message from the line after master.stop() logs its message. But the script doesn't record the log message from the line after the
Master.join()
(_raptor.join()
)I'm going to leave that branch undisturbed for a while to give you a chance to look at it. I'm making some adjustments to the Master script in a different branch to move the Worker management out of the main script body. I'll let you know if I encounter something similar with a different script structure, but I'll also be interested to hear whatever you deduce.
The text was updated successfully, but these errors were encountered: