Skip to content

CLI utility for deployment of containerized jobs on SLURM HPCs

License

Notifications You must be signed in to change notification settings

jhn-nt/docksing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DockSing

CLI Utility for deployment of containerized jobs on SLURM HPCs

python pypi

Installation

Requirements:

On your local host run:

pip install docksing

What is Docksing?

DockSing is a pure-python lightweight CLI tool to orchestrate deployment of jobs to docker and slurm end points based on the compose specification and loosely inspired by Google Vertex AI.

Deploying a job on a local docker:

docksing --ssh username@hostname --config config.yaml --local

Deploying a job on a remote Slurm HPC:

docksing --ssh username@hostname --config config.yaml 

Why DockSing?

DockSing exists to reduce the overhead effort required to scale from development to testing to deployment of experiments. Specifically, DockSing takes care of automatically converting docker-compose specifications to singularity specifications overloaded with SBATCH commands, lifting us from dealing with the nuisances of mapping and combining the three.

Who is Docksing for?

DockSing aims to simplify the experimentation workflow for those using docker and more specifically devcontainers.

Overview

Just like docker-compose, Docksing requires a config.yaml to initiate a job.
This config.yaml, however, slightly differs from a typical docker-compose file in that it is split in three chapters:

  1. remotedir: Path to the target directory that will be created in the remote host. All files required to run the job, comprising of .sif images, bind maps and eventual job outputs will be stored here.
  2. slurm: Chapter of key:value maps encoding srun options (reference).
  3. container: Chapter containing all entries one would use in a normal docker-compose file. Note that Docksing only supports some limited docker-compose functionalities, please refer to the supported compose specification section below.

Example of a config.yaml:

  remotedir:  path/to/remote/direcotry
  slurm:
    nodes:  1
    cpus-per-task:  1
    job-name: job-name
  container:
    image:  tag
    commands: ["sh -c","'echo Hello World'"]
    environment:
    - env_variable: env_content
    - another_env_variable: another_env_content  
    volumes:
    - /absolute/path/to/bind:/container/path/to/bind
    - /another/absolute/path/to/bind:another/container/path/to/bind

To launch the job then run:

docksing --ssh username@hostname --config path/to/config.yaml 

Essentially the above commands automate the follwoing actions, in order:

  1. Attempts to establish a connection through SSH to the remote host
  2. Attempts to establish a connection to the local docker daemon
  3. Verifies that the image tag is available in the local docker daemon
  4. Creates the remotedir in the remote host
  5. Copies the image tag pulled from the local docker daemon to the remotedir
  6. Copies the content of all source binds in volumes from the local host to the remote host
  7. Converts the image tag in a .sif build, compatible with singularity
  8. Starts the srun job by passing all options found in the slurm chapter while also passing all options found in container to the nested singularity run

A side note, steps 7 and 8 and executed within the same srun instance to minimize queues on the remote.

Tutorial

In this use case we wish to print the content of some environment variables in a .txt file.
This can be achieved with the following config.yaml:

remotedir:  target_directory_on_remote_host

slurm:
  nodes: 1
  cpus-per-task: 1
  job-name: name_of_the_slurm_job

container:
  image:  alpine:latest
  commands: ["sh -c","'echo the $VARIABLE is $VALUE   > /output/result.txt'"]
  environment:
    - VARIABLE: color
    - GOOGLE_APPLICATION_CREDENTIALS: credentials
    - VALUE: red
  volumes:
    - /absolute/path/to/output:/output

First and foremost, we pull the image (or build a dockerfile) required to run the job:

$ docker pull alpine:latest

DockSing will raise an error if it cannot find the image in the local docker daemon.
Afterwords, we may wish to assert whether our setup is correct by inspecting the explicit cli, through:

$ docksing --ssh username@hostname --config config.yaml --cli --local  
docker run --env VARIABLE=color --env GOOGLE_APPLICATION_CREDENTIALS=credentials --env VALUE=red --volume /absolute/path/to/output:/output alpine:latest sh -c 'echo the $VARIABLE is $VALUE   > /output/result.txt'

If it does look right, we may proced to run a local run to assess whether our logic is correct:

$ docksing --ssh username@hostname --config config.yaml --local

If it is, we likewise check whether our setup is correct in the remote case:

$ docksing --ssh username@hostname --config config.yaml --cli 
srun --nodes=1 --cpus-per-task=1 --job-name=name_of_the_slurm_job bash -c "singularity build target_directory_on_remote_host/91ef0af61f39.sif docker-archive://target_directory_on_remote_host/91ef0af61f39.tar && singularity run --env VARIABLE=color --env GOOGLE_APPLICATION_CREDENTIALS=credentials --env VALUE=red --bind target_directory_on_remote_host/output:/output target_directory_on_remote_host/91ef0af61f39.sif sh -c 'echo the $VARIABLE is $VALUE   > /output/result.txt'"

Note how a simple docker run quickly explodes in complexity and verbosity when we need to deploy it remotely via SLURM on singularity, which may be prone to errors.
If the command looks right, we may actually submit the job on the HPC via:

$ docksing --ssh username@hostname --config config.yaml 

Which lauches the job.
Often, however, one may which to monitor the logs to assess how the job is going. To do so, one can simply run:

$ docksing --ssh username@hostname --config config.yaml --stream 

Which streams the remote stdout and stderr to the current console.

List of Features

  1. Launching a local job on docker
docksing --ssh username@hostname --config config.yaml --local 
  1. Launching a remote job
docksing --ssh username@hostname --config config.yaml --cli
  1. Inspecting local cli
docksing --ssh username@hostname --config config.yaml --local --cli
  1. Inspecting remote cli
docksing --ssh username@hostname --config config.yaml --cli
  1. Stream the remote job logs to a local console
docksing --ssh username@hostname --config config.yaml --stream

Supported Compose Specification

  • workdir
  • environment
  • volumes
  • commands

Design Notes

DockSing is developed with the aim of maintaining the highest adherence to existing standards with the lowest code overhead possible, in order to retrospectively preserve interoperability with docker, singularity and SLURM documentations.
To squeeze the most out of DockSing it is advisable to have good proficiency with the docker ecosystem.

Limitations

Docksing was tested on a Windows Linux Subsytem, milage may very on other settings.