Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds slurm support #202

Merged
merged 4 commits into from
Jan 30, 2024
Merged

Adds slurm support #202

merged 4 commits into from
Jan 30, 2024

Conversation

o-smirnov
Copy link
Member

@SpheMakh, in the interest of smaller PRs, review early, review often please!

@Athanaseus this ought to get you going. The slurm backend "wrapper" is enabled like so:

This causes all commands to be wrapped in an srun call. Just add scatter: -1 to your for-loops, and you're all parallel.

(I call it a "wrapper" because it wraps an actual backend like "native" or "singularity" -- it is not a backend on its own per se.)

The recipe itself is meant to be run on the head node (inside a tmux or screen session, of course).

Under slurm, you don't want to be building Singularity images on-demand on the compute nodes. Rather, pre-build all necessary images on the head node before you run a recipe. There is a new stimela build recipe.yml command that will do this (options very similar to the run command). You should also disable on-demand building of images during the run like so:

auto_build: false # on slurm, we pre-build on the head node explitictly

You'll probably want to tweak srun options on a per-cab (or per-step) basis. This is done similarly to how we tweak k8s settings per cab here:

## some cab-specific backend tweaks

So, something like:

cabs:
  breizorro:
    backend:
      slurm:
        srun_opts:
          mem: 16GB
          cpus_per_task: 4 

..will run breizorro with srun --mem 16GB --cpus-per-task 4. The srun_opts section is free-form, so
all srun options are supported.

@SpheMakh SpheMakh self-requested a review January 30, 2024 12:22
@SpheMakh SpheMakh merged commit b8cbeb9 into master Jan 30, 2024
4 checks passed
@SpheMakh SpheMakh deleted the slurmify branch January 30, 2024 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants