-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add job avoidance #22
Merged
Merged
Changes from 70 commits
Commits
Show all changes
73 commits
Select commit
Hold shift + click to select a range
77203ac
Merge branch 'output_funcs' into develop
julianhess d8c5997
Allow Slurm Docker to be run as non-root
julianhess 4e8073d
Use containerized workers by default
julianhess b6a0507
Override root user from superclass
julianhess b1072af
Don't symlink deloc.py; don't copy to delocalize
julianhess a32cfb3
Increase default NFS disk size for perf. reasons
julianhess d12582f
Allow custom image to create NFS disk
julianhess 595c837
Unmount NFS on exit
julianhess d5799f0
Don't symlink absolute paths to workspace
julianhess 620c97a
Make "symlink" synonymous to "localize"
julianhess 94140d9
Initial commit of working job avoidance
julianhess 47f5c26
Merge branch 'docker' into develop
julianhess c5527e8
Merge branch 'develop' into job_avoidance
julianhess 0e2cec9
submit_batch_job needs to be aware of jobavoidance
julianhess 79024dd
os.rmdir is not rm -rf
julianhess ae0149f
More robust job avoidance checks
julianhess e209f3f
Concatenate avoided DF with previously existing DF
julianhess 28de503
Don't assume job_id corresponds to Slurm task array ID
julianhess fc2e80c
Sort output DF after concatenating
julianhess 65eb48e
Bump minimum Python version to 3.7
julianhess 3fcc699
Merge branch 'master' into develop
julianhess 08a37da
Merge branch 'docker' into develop
julianhess 0ab09e4
Merge branch 'docker' into develop
julianhess 66e31bd
Add warning about hardcoded path
julianhess a31768f
Merge branch 'docker' into develop
julianhess 26f1b6f
Merge branch 'docker' into develop
julianhess f323a9e
Re-create staging dir after nuking it
julianhess aadefa0
Merge branch 'job_avoidance' into develop
julianhess b148964
Catch ValueError exceptions in job_avoid()
julianhess 66de503
Print discrepant inputs/outputs
julianhess 3f12853
Catch ValueError exceptions in job_avoid()
julianhess 3ff14fc
Print discrepant inputs/outputs
julianhess 916f241
Merge branch 'job_avoidance' into develop
julianhess 32d5172
Prevent race conditions when saving dataframe
julianhess e0245d4
Merge branch 'job_avoidance' into develop
julianhess 138c65d
Output dataframe returns CPU time in seconds
julianhess 87b7833
Pass inputs to df through shlex
julianhess 27f81ae
Forgot to replace hours -> seconds in a couple places
julianhess a8fdc8d
Merge branch 'cpu_time_by_second' into develop
julianhess 40ded7e
Add more conditions for not job avoiding
julianhess 58e5f11
Merge branch 'job_avoidance' into develop
julianhess 8bcfb45
Make exception for canine outputs
julianhess 4c0d1e5
Merge branch 'nfs_overrides' into develop
julianhess 0641679
Kill straggling jobs upon cluster exit
julianhess cdbfe23
Error checking in invoke method
julianhess bf30351
Merge branch 'master' into develop
julianhess 252b47f
Merge branch 'master' into stragglers
julianhess cad6f92
Merge branch 'develop' into stragglers
julianhess c6fb82c
Update default compute scripts to new path
julianhess 75c3820
Add NFS monitoring thread
julianhess 477d3c0
Pass elastic parameter to wait_for_cluster_ready
julianhess 80da83c
Merge branch 'master' into develop
julianhess dffbfce
Canine output folders can be nested now
julianhess 1ba9bac
Sometimes the GCE API hangs; catch this exception
julianhess 99c9ab6
Support Pandas dataframes and series for input
julianhess 7ccaecd
Merge branch 'stragglers' into develop
julianhess f348442
Handle errors from scancel
julianhess 5a47ad7
More robust cancelling of stragglers
julianhess 5eeee5b
Merge branch 'stragglers' into develop
julianhess 0e9582d
Remove stdout/stderr hack
julianhess fef7a30
Catch exceptions from Google's API's random failures
julianhess 964dc96
Wrap whole cancellation step in try block
julianhess 3dd94de
Catch exception from deleting node config file
julianhess 5bd6b03
Merge branch 'master' into develop
julianhess d039eed
Retry removing job shard directories
julianhess e02d6a7
Remove common inputs dir. if job avoidance occurs
julianhess 631b4ab
Retry all rmtree attempts
julianhess c33a6e5
Change GCP image family/Docker image names
julianhess ef69850
Bind mount /etc/gcloud
julianhess e5a87f7
Bind mount ~/.config/gcloud to /etc/gcloud
julianhess ee64193
Job avoidance now works for all backends
agraubert 99f8e7b
Removed UUID import
agraubert ab151f6
Make it clearer that cluster_name is a required param
julianhess File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that
cluster_name
isn't a positional? Enforcing it as a required optional seems weirdThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want there to be any positional arguments. I should really make this constructor signature
def __init__(self, *, nfs_compute_script, ...
to clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All arguments are positional in python. They can be put in arbitrary order using names regardless of if you set a default value. Not setting a default value simply makes an argument required.
Unless you're creating a complex object hierarchy where the available arguments change based on selected arguments or mixins, I am generally of the opinion that always-available arguments should be explicit in the constructor signature for the sake of clarity and readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have functions which do not support positional arguments:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, that's a sneaky new python 3 trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most code I develop has been 2 and 3 compatible, so I haven't come across that pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's Python 3-only, though I wouldn't exactly call it "new" :-)
Added to the standard in 2006: https://www.python.org/dev/peps/pep-3102/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I guess I should clarify. I think that setting an argument to have a default kind of signifies that it's optional. I have required parameters in the NFSLocalizer that have a default and appear in the middle of the argument list for compliance with the base class API. Is there a reason not to move the required parameters to the front of the list and remove their defaults so it's more obvious in docs/inspection that they are required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I was being real dumb. Let's make this a positional and remove the check in the constructor.
https://github.com/broadinstitute/canine/blob/e02d6a73ffb93442ca5c70d9cd4345f780b7a2c6/canine/backends/dockerTransient.py#L30-L31
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It's not like the exception message gives the user any additional information.)