-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submitit integration #2125
base: main
Are you sure you want to change the base?
Submitit integration #2125
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in submitit.py is a 90% copy-paste of Michael Shvartsman's code here (https://github.com/mshvartsman/Ax/blob/submitit-runner/ax/runners/submitit.py).
|
||
elif isinstance(self.executor, LocalExecutor): | ||
############ | ||
##### this fails to pickle, doesn't work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this fail?
from submitit.core.core import Executor | ||
|
||
|
||
class SubmitItRunner(Runner): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are poll_exception
and poll_available_capacity
important to define for a Runner?
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #2125 +/- ##
==========================================
- Coverage 94.79% 94.68% -0.12%
==========================================
Files 460 461 +1
Lines 45400 45457 +57
==========================================
+ Hits 43039 43040 +1
- Misses 2361 2417 +56 ☔ View full report in Codecov by Sentry. |
This pull requests adds a runner for the SubmitIt package (https://github.com/facebookincubator/submitit). With the SubmitIt package, we can use the Ax scheduler to schedule jobs on a SLURM cluster.
The integration requires defining a Runner and a Metric for SubmitIt executors (submitit.py). The quick-start guide (submitit_scheduler.py) explains how to set-up a basic experiment and directs the user towards further Ax resources that they may be interested in.