Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem stopping Raptor master in 1.36 #3001

Open
eirrgang opened this issue Aug 2, 2023 · 2 comments
Open

Problem stopping Raptor master in 1.36 #3001

eirrgang opened this issue Aug 2, 2023 · 2 comments
Assignees

Comments

@eirrgang
Copy link
Contributor

eirrgang commented Aug 2, 2023

Automated test jobs for scalems started failing recently for the devel branch of RP.
See, for instance,
https://github.com/SCALE-MS/scale-ms/actions/runs/5663049971/job/15569404817

it seems that the Raptor Master task scalems-rp-raptor.846fc04c-2b48-11ee-b1b6-8daf5ac26a8e should have received a message that told it to call self.stop() on itself from within a result_cb() . The Task carrying that message got marked DONE, but the Master task kept running for at least 20 seconds in state AGENT_EXECUTING. Later, it was successfully canceled with Task.cancel() . https://github.com/SCALE-MS/scale-ms/suites/14565990020/artifacts/824790031

It looks like master.stop() got called without an error and there is a log of the term getting set. Then the callback log message from the line after master.stop() logs its message. But the script doesn't record the log message from the line after the Master.join() (_raptor.join())

I'm going to leave that branch undisturbed for a while to give you a chance to look at it. I'm making some adjustments to the Master script in a different branch to move the Worker management out of the main script body. I'll let you know if I encounter something similar with a different script structure, but I'll also be interested to hear whatever you deduce.

@eirrgang
Copy link
Contributor Author

eirrgang commented Aug 3, 2023

Update: Since yesterday's release of 1.36, scalems tests fail against the RP official release.

@eirrgang eirrgang changed the title Problem stopping Raptor master in devel? Problem stopping Raptor master in 1.36 Aug 3, 2023
eirrgang added a commit to SCALE-MS/scale-ms that referenced this issue Aug 3, 2023
Temporarily avoid radical.pilot 1.36, pending resolution of
radical-cybertools/radical.pilot#3001

Revert this commit when resolved.

Ref #379.
eirrgang added a commit to SCALE-MS/scale-ms that referenced this issue Aug 3, 2023
Temporarily avoid radical.pilot 1.36, pending resolution of
radical-cybertools/radical.pilot#3001

Revert this commit when resolved.

Ref #379.
@andre-merzky
Copy link
Member

Thanks Eric, I'll check it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants