-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make HTEX reject empty resource specification in MPI mode #3442
Conversation
Also, happy to hear feedback on the PR! |
parsl_resource_specification
in mpi mode causes parsl to hang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments on specific lines
ok, reviewing this again, I feel like you should take a step back and review what For both MPI mode and not-MPI mode, there are some keys that are required, some that are optional and some that are forbidden - different for each mode. The current validation code in And the current validation code doesn't look for which keys/combinations are required at all. Your current PR doesn't look to see which keys are required either, it just requires that something is specified. So maybe it would be good, perhaps with help from @WardLT and @yadudoc , to figure out and write down for both modes which keys are required, optional and forbidden - based on understanding what the code does with those keys. And then rewrite |
Thanks @benclifford for the comment and it makes perfect sense to check for required parameters be passed. Let me dig in the code for the required parameters as well. I will confirm my finding with @WardLT and @yadudoc! |
another question i have is in mpi mode, there are three interacting parameters and this validate function does more than validate - it computes new values. but what happens in a user specifies all three values? (ranks per node, num nodes and num ranks?) - for example if a user specifies 2 ranks per node, 500 ranks, 3 nodes? I feel like there is something bad here that should be validated. I've also raised that question with @rjmello offline who is doing some other validation work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @R1j1t ! I'm inclined to approve this PR as it takes a good step in the direction of making resource spec validation errors clearer for users.
Building on @benclifford , I think we can do better by deviating farther from the initial implementation. For example, we could use a data class to handle the validation logic rather than duplicate logic dataclasses already handles well. We should also move the checks from buried within HTEx to part of the app module to encourage use of the same spec across different Executors.
Those ideas and the ones Ben already has feel like things we should spread over multiple PRs. Do you agree?
PS. I'm about to leave on vacation. Reviews are going to be slower over the next few weeks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I made a couple more change requests- then as @WardLT is happy with the rest of this, we can merge this and other work can happen in other PRs (either from you or from whoever else works on this)
.gitignore
Outdated
|
||
#deps | ||
/.deps | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't change .gitignore with changes that are not relevant to the topic of the PR: none of these changes seem relevant to the point of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - you can also fix the MPI test perhaps. It looks like: parsl/tests/test_mpi_apps/test_mpi_mode_enabled.py uses local_setup
and local_teardown
when it could use local_config
instead to supply the test configuration. If the test used local_config
, then the test system should make those files appear under .pytest/parsltest-current
like it does for many other tests.
|
||
def __init__(self, invalid_keys: Set[str]): | ||
def __init__(self, invalid_keys: Union[Set[str], str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is even more weird than the set-of-strings you changed before! invalid_keys isn't even a collection of objects any more, sometimes!
To get this PR merged without too much further work, make a new exception class MissingResourceSpecification which doesn't take an invalid_keys set, and use that to describe a missing resource specification; and leave this InvalidResourceSpecification for representing incorrectly used keys.
I am not sure if I am missing something, but based on the below code parsl/parsl/executors/high_throughput/mpi_prefix_composer.py Lines 72 to 75 in 146a6af
parsl/parsl/executors/high_throughput/mpi_resource_management.py Lines 180 to 194 in 5a9c1cb
cc: @yadudoc |
parsl_resource_specification
in mpi mode causes parsl to hang
Thanks @benclifford, @WardLT for the help! Looking forward to contributing to parsl :) |
Description
Added a check in
HighThroughputExecutor.submit
in cases whereparsl_resource_specification
is empty whenenable_mpi_mode=True
Possible changes:
update user doc(user doc does mention the need to configureparsl_resource_specification
InvalidResourceSpecification
as it currently expects a setRefactortest_mpi_mode_enabled.py
? Many function arguments are never used.Changed Behaviour
Parsl fails with an error message, instead of hanging on the user
Fixes
Fixes #3427
Type of change
Choose which options apply, and delete the ones which do not apply.