Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tighten database access in htex/monitoring test to avoid issue #3287 #3293

Merged
merged 1 commit into from
Mar 26, 2024

Conversation

benclifford
Copy link
Collaborator

A more principled fix of #3287 looks to be much more deeply invasive, and involves a big rework of how ZMQ is used.

Type of change

  • Bug fix

A more principled fix of #3287 looks to be much more deeply invasive, and
involves a big rework of how ZMQ is used.
@benclifford benclifford merged commit 20c7bf3 into master Mar 26, 2024
6 checks passed
@benclifford benclifford deleted the benc-tighten-htex-test-constraints branch March 26, 2024 15:11
benclifford added a commit that referenced this pull request Apr 8, 2024
This is part of work to move JobStatusPoller facade state into other
classes, as part of job handling rearrangements in PR #3293

This should not change behaviour: each executor has a single
PolledExecutorFacade and a single strategy.ExecutorState, and this
PR moves the 'first' field from one to the other.

parsl/tests/test_monitoring/test_htex_init_blocks_vs_monitoring.py tests that
init_blocks handling still fires properly - that's what is switched by this
'first' field.
benclifford added a commit that referenced this pull request Apr 8, 2024
This is part of work to move JobStatusPoller facade state into other classes, as part of job handling rearrangements in PR #3293

This should not change behaviour: each executor has a single PolledExecutorFacade and a single strategy.ExecutorState, and this PR moves the 'first' field from one to the other.

parsl/tests/test_monitoring/test_htex_init_blocks_vs_monitoring.py tests that init_blocks handling still fires properly - that's what is switched by this 'first' field.
benclifford added a commit that referenced this pull request Apr 8, 2024
This is part of work to move JobStatusPoller facade state into other classes, as part of job handling rearrangements in PR #3293

This value is constant, in both executors and in PollingExecutorFacade.

This PR should not change behaviour
benclifford added a commit that referenced this pull request Apr 9, 2024
…3321)

This is part of work to move JobStatusPoller facade state into other classes, as part of job handling rearrangements in PR #3293

This value is constant, in both executors and in PollingExecutorFacade.

This PR should not change behaviour
benclifford added a commit that referenced this pull request Apr 10, 2024
…ugin

Before this PR, the DataFlowKernel and each JobStatusPoller
PolledExecutorFacade each opened a ZMQ connection to the monitoring router.
These connections are not threadsafe, but (especially in the DFK case) no
reasoning or checking stops the DFK's connection being used in multiple
threads.

Before this PR, the MonitoringHub presented a 'send' method and stored the
DFK's ZMQ connection in self._dfk_channel, and each PolledExecutorFacade
contained a copy of the ZMQ channel code to open its own channel, configured
using parameters from the DFK passed in at construction.

This PR:

* moves the above uses to use the MonitoringRadio interface (which was
originally designed for remote workers to send monitoring information, but
seems ok here too)

* has MonitoringHub construct an appropriate MonitoringRadio instance for
use on the submit side, exposed as self.radio;

* replaces the implementation of send with a new MultiprocessingQueueRadio
which is thread safe but only works in the same multiprocessing environment
as the launched monitoring database manager process (which is true on the
submit side, but for example means this radio cannot be used on most
remote workers)

This work aligns with the prototype in #3315 (which pushes on monitoring
radio configuration for remote workers) and pushes in the direction
(without getting there) of allowing other submit-side hooks.

This work removes some monitoring specific code from the JobStatusPoller,
replacing it with a dependency injection style.
This is part of work to move JobStatusPoller facade state into other classes,
as part of job handling rearrangements in PR #3293

This PR will change how monitoring messages are delivered from the submitting
process, and the most obvious thing I can think of that will change is how
this will behave under load: heavily loaded messaging causing full buffers and
other heavy-load symptoms will now behave as multiprocessing Queues do, rather
than as ZMQ connections do. I have not attempted to characterise either of
those modes.
benclifford added a commit that referenced this pull request Apr 11, 2024
…ugin (#3338)

Before this PR, the DataFlowKernel and each JobStatusPoller
PolledExecutorFacade each opened a ZMQ connection to the monitoring router.
These connections are not threadsafe, but (especially in the DFK case) no
reasoning or checking stops the DFK's connection being used in multiple
threads.

Before this PR, the MonitoringHub presented a 'send' method and stored the
DFK's ZMQ connection in self._dfk_channel, and each PolledExecutorFacade
contained a copy of the ZMQ channel code to open its own channel, configured
using parameters from the DFK passed in at construction.

This PR:

* moves the above uses to use the MonitoringRadio interface (which was
originally designed for remote workers to send monitoring information, but
seems ok here too)

* has MonitoringHub construct an appropriate MonitoringRadio instance for
use on the submit side, exposed as self.radio;

* replaces the implementation of send with a new MultiprocessingQueueRadio
which is thread safe but only works in the same multiprocessing environment
as the launched monitoring database manager process (which is true on the
submit side, but for example means this radio cannot be used on most
remote workers)

This work aligns with the prototype in #3315 (which pushes on monitoring
radio configuration for remote workers) and pushes in the direction
(without getting there) of allowing other submit-side hooks.

This work removes some monitoring specific code from the JobStatusPoller,
replacing it with a dependency injection style.
This is part of work to move JobStatusPoller facade state into other classes,
as part of job handling rearrangements in PR #3293

This PR will change how monitoring messages are delivered from the submitting
process, and the most obvious thing I can think of that will change is how
this will behave under load: heavily loaded messaging causing full buffers and
other heavy-load symptoms will now behave as multiprocessing Queues do, rather
than as ZMQ connections do. I have not attempted to characterise either of
those modes.

Co-authored-by: Kevin Hunter Kesling <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants