-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/component sync #3269
Fix/component sync #3269
Conversation
Tests currently fail as this PR depends on radical-cybertools/radical.utils#428. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #3269 +/- ##
==========================================
- Coverage 42.83% 42.79% -0.05%
==========================================
Files 97 97
Lines 11282 11308 +26
==========================================
+ Hits 4833 4839 +6
- Misses 6449 6469 +20 ☔ View full report in Codecov by Sentry. |
src/radical/pilot/pilot.py
Outdated
self._log.error("[Callback]: pilot '%s' failed (exit)", uid) | ||
self._sub.stop() | ||
self._sub = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we setting self._sub = None
after failure? I am assuming it will not affect anything because the pilot failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right, that's not needed anymore - fixed!
src/radical/pilot/states.py
Outdated
@@ -71,7 +71,7 @@ def _pilot_state_progress(pid, current, target): | |||
# allow to transition from FAILED to DONE (done gets picked up from DB, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done gets picked up from DB
-->no more DB!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, also fixed now.
TODO AM: CI examples should only be done in release |
Thanks @andre-merzky this looks good to me. I also just ran it Ben's test: import pytest
import threading
import parsl
from parsl.tests.configs.local_radical_mpi import fresh_config as local_config
@parsl.python_app
def some_mpi_func(msg, sleep, comm=None, parsl_resource_specification={}):
import time
msg = 'hello %d/%d: %s' % (comm.rank, comm.size, msg)
time.sleep(sleep)
print(msg)
return comm.size
apps = []
@pytest.mark.local
@pytest.mark.radical
def test_radical_mpi(n=7):
# rank size should be > 1 for the
# radical runtime system to run this function in MPI env
for i in range(2, n):
spec = {'ranks': i}
t = some_mpi_func(msg='mpi.func.%06d' % i, sleep=1, comm=None, parsl_resource_specification=spec)
apps.append(t)
assert [len(app.result()) for app in apps] == list(range(2, n))
assert threading.active_count() == 1, "test left threads running: " + repr(threading.enumerate()) and it passed: (parsl_rp) aymen@alienware:~/RADICAL/PARSL/parsl$ pytest parsl/tests/test_radical/test_mpi_funcs.py --config local
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.12.3, pytest-7.4.4, pluggy-1.5.0
Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket=<bucket_type>
rootdir: /home/aymen/RADICAL/PARSL/parsl/parsl/tests
configfile: pytest.ini
plugins: cov-6.0.0, typeguard-4.4.1, random-order-1.1.1
collected 1 item
parsl/tests/test_radical/test_mpi_funcs.py . [100%]
============================================================================= warnings summary =============================================================================
parsl/tests/test_radical/test_mpi_funcs.py:20
/home/aymen/RADICAL/PARSL/parsl/parsl/tests/test_radical/test_mpi_funcs.py:20: PytestUnknownMarkWarning: Unknown pytest.mark.radical - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
@pytest.mark.radical
test_radical/test_mpi_funcs.py::test_radical_mpi
/home/aymen/ve/parsl_rp/lib/python3.12/site-packages/radical/utils/atfork/stdlib_fixer.py:61: UserWarning: Import `radical` modules before `logging` to avoid the application to deadlock on `fork()`!
warnings.warn(msg)
test_radical/test_mpi_funcs.py::test_radical_mpi
/home/aymen/ve/parsl_rp/lib/python3.12/site-packages/radical/utils/atfork/atfork.py:217: DeprecationWarning: This process (pid=2653458) is multi-threaded, use of fork() may lead to deadlocks in the child.
pid = _orig_os_fork()
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================== 1 passed, 3 warnings in 25.43s ======================================================================
(parsl_rp) aymen@alienware:~/RADICAL/PARSL/parsl$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @andre-merzky
In addition to the work happening [here](radical-cybertools/radical.pilot#3269) to address this issue #3708. This PR ensures the cleanup of threads generated by RPEX. - Bug fix (partially #3708) --------- Co-authored-by: Ben Clifford <[email protected]>
In addition to the work happening [here](radical-cybertools/radical.pilot#3269) to address this issue #3708. This PR ensures the cleanup of threads generated by RPEX. - Bug fix (partially #3708) --------- Co-authored-by: Ben Clifford <[email protected]>
This PR cleans up the component startup synchronization by using a zmq pipe instead of heartbeat messages. The latter became superfluous as we have competing termination mechanisms in place: pwatcher, termination commands, and heartbeats. The latter now got removed to avoid conflicts in termination procedures.