This tutorial makes use of the docker-compose-lmod.yml which defines two additional volume mounts.
modules
- pre-compiled binary files that will be reference by themodule
orml
commands inlmod
modulefiles
- pre-defined Lua scripts that enable the proper file linkage and environment definitions for the modules.
The version of Lmod being used predefines a few paths for which it will look for modulefiles that define the modules.
MODULEPATH= \
/opt/apps/modulefiles/Linux: \
/opt/apps/modulefiles/Core: \
/opt/apps/lmod/lmod/modulefiles/Core
We are going to choose /opt/apps/modulefiles/Linux
to define the hierarchy for our modulefiles and define a corresponding modules directory in /opt/apps/Linux
using a similar hierarchy.
- Our modules:
/opt/apps/Linux/PACKAGE/PACKAGE_FILES
- Our modulefiles:
/opt/apps/modulefiles/Linux/PACKAGE/PACKAGE_LUA_SCRIPT
The lmod-modules-centos repository defines how to make Lmod modules specific to the CentOS 7 based Slurm in Docker project.
Once built, the modules and modulefiles need to be copied into the corresponding modules
and modulefiles
directories using a specific directory structure.
Using java 8 as an example, we'd populate the directories as such.
- Assumes the slurm-in-docker and lmod-modules-centos repositories are at the same directory level on the host for relative path commands.
- Module versions used herein are for reference purposes and subject to change as updates occur.
Create new directories to hold the module and module file information for java packages
cd slurm-in-docker
mkdir -p modules/java modulefiles/java
Copy the precompiled module files and Lua script module file definition to the appropriate location.
cp ../lmod-modules-centos/java/1.8.0_171/jdk1.8.0_171.tar.gz modules/java/jdk1.8.0_171.tar.gz
cp ../lmod-modules-centos/java/1.8.0_171/1.8.0_171.lua modulefiles/java/1.8.0_171.lua
cd modules/java/
tar -xzf jdk1.8.0_171.tar.gz
rm -f jdk1.8.0_171.tar.gz
cd -
Using the docker-compose-lmod.yml file, test the newly created java module.
The docker-compose-lmod.yml definition add two new volume mounts which make the modules and modulefiles available to the controller and worker nodes at run time.
$ docker-compose -f docker-compose-lmod.yml up -d
Creating network "slurmindocker_slurm" with the default driver
Creating volume "slurmindocker_home" with default driver
Creating volume "slurmindocker_secret" with default driver
Creating controller ... done
Creating worker01 ... done
Creating database ... done
Creating worker02 ... done
Use docker exec to get onto the controller node
docker exec -ti controller /bin/bash
List available modules using module avail
# module avail
/opt/apps/modulefiles/Linux -
java/1.8.0_171
/opt/apps/lmod/lmod/modulefiles/Core
lmod/7.7 settarg/7.7
Use "module spider" to find
all possible modules.
Use "module keyword key1 key2
..." to search for all
possible modules matching any
of the "keys".
We can see our newly added java/1.8.0_171
as an option.
Try the java -version
command prior to loading the module.
# java -version
bash: java: command not found
Next, load the java module and try the same command again.
# module load java
# java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
Use the which
command to see where java was installed.
# which java
/opt/apps/Linux/java/jdk1.8.0_171/bin/java
Unload the module and try the which
command again.
# module unload java
# which java
/usr/bin/which: no java in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
Exit the controller node
# exit
Stop the running containers, remove the volume and network defintions
docker-compose -f docker-compose-lmod.yml stop
docker-compose -f docker-compose-lmod.yml rm -f
docker volume rm slurmindocker_home slurmindocker_secret -f
docker network rm slurmindocker_slurm
Nextflow - Data-driven computational pipelines
- Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages.
- Its fluent DSL simplifies the implementation and the deployment of complex parallel and reactive workflows on clouds and clusters.
The Nextflow packages have a dependency on Java, so in order to enable Nextflow as a module within our Slurm cluster we also have to ensure it's dependencies are also installed prior to running it. See the nextflow modulefiles definition for how this is done.
Learn more about Nextflow from the official docs
Enable the Nextflow module within the Slurm cluster and run a hello world workflow accross our two worker nodes.
- Assumes the slurm-in-docker and lmod-modules-centos repositories are at the same directory level on the host for relative path commands.
- Module versions used herein are for reference purposes and subject to change as updates occur.
Create new directories to hold the module and module file information for java and nextflow packages
cd slurm-in-docker
mkdir -p \
modules/java \
modulefiles/java \
modules/nextflow \
modulefiles/nextflow
Copy the precompiled module files and Lua script module file definition to the appropriate location.
cp ../lmod-modules-centos/java/1.8.0_171/jdk1.8.0_171.tar.gz \
modules/java/jdk1.8.0_171.tar.gz
cp ../lmod-modules-centos/java/1.8.0_171/1.8.0_171.lua \
modulefiles/java/1.8.0_171.lua
cp ../lmod-modules-centos/nextflow/0.28.0/nextflow-0.28.0.tar.gz \
modules/nextflow/nextflow-0.28.0.tar.gz
cp ../lmod-modules-centos/nextflow/0.28.0/0.28.0.lua \
modulefiles/nextflow/0.28.0.lua
cd modules/java/
tar -xzf jdk1.8.0_171.tar.gz
rm -f jdk1.8.0_171.tar.gz
cd -
cd modules/nextflow/
tar -xzf nextflow-0.28.0.tar.gz
rm -f nextflow-0.28.0.tar.gz
cd -
Using the docker-compose-lmod.yml file, test the newly created nextflow module.
docker-compose -f docker-compose-lmod.yml up -d
Add the worker user to the Slurm database
$ docker exec controller bash -c 'sacctmgr -i add account worker description="worker account" Organization=Slurm-in-Docker'
Adding Account(s)
worker
Settings
Description = worker account
Organization = slurm-in-docker
Associations
A = worker C = snowflake
Settings
Parent = root
$ docker exec controller bash -c 'sacctmgr -i create user worker account=worker adminlevel=None'
Adding User(s)
worker
Settings =
Admin Level = None
Associations =
U = worker A = worker C = snowflake
Non Default Settings
Log onto the controller node as the worker user and change to the $HOME
directory.
$ docker exec -ti -u worker controller /bin/bash
[worker@controller /]$ cd ~
[worker@controller ~]$ pwd
/home/worker
See what modules are available
$ module avail
/opt/apps/modulefiles/Linux -
java/1.8.0_171
nextflow/0.28.0
/opt/apps/lmod/lmod/modulefiles/Core
lmod/7.7 settarg/7.7
Use "module spider" to find
all possible modules.
Use "module keyword key1 key2
..." to search for all
possible modules matching any
of the "keys".
Load the nextflow module and check version (java is a dependency)
$ module load nextflow
Lmod has detected the
following error: Cannot
load module "nextflow/0.28.0"
without these module(s)
loaded:
java
While processing the following module(s):
Module fullname Module Filename
--------------- ---------------
nextflow/0.28.0 /opt/apps/modulefiles/Linux/nextflow/0.28.0.lua
$ module load java nextflow
$ module list
Currently Loaded Modules:
1) java/1.8.0_171
2) nextflow/0.28.0
$ nextflow -version
N E X T F L O W
version 0.28.0 build 4779
last modified 10-03-2018 12:13 UTC
cite doi:10.1038/nbt.3820
http://nextflow.io
Nextflow has a hello-world style workflow that can be invoked by calling nextflow run hello
. Run this and notice what occurs on the system.
$ nextflow run hello
N E X T F L O W ~ version 0.28.0
Pulling nextflow-io/hello ...
downloaded from https://github.com/nextflow-io/hello.git
Launching `nextflow-io/hello` [desperate_curie] - revision: d4c9ea84de [master]
[warm up] executor > local
[99/aa6988] Submitted process > sayHello (3)
[62/a1e3cd] Submitted process > sayHello (1)
[92/c892fa] Submitted process > sayHello (4)
[57/9ce3ec] Submitted process > sayHello (2)
Hello world!
Ciao world!
Hola world!
Bonjour world!
This call fetched content from the nextflow-io site and added it to the user's home directory in $HOME/.nextflow
and put any log infromation into a file named $HOME/.nextflow.log
.
$ ls -alh $HOME | grep .nextflow
drwxrwxr-x 3 worker worker 4.0K Apr 22 17:53 .nextflow
-rw-rw-r-- 1 worker worker 7.0K Apr 22 17:53 .nextflow.log
Contents of $HOME/.nextflow
:
$ tree -a .nextflow
.nextflow
|-- cache
| `-- 09a64642-04bc-44c4-859a-99b5568fdbf1
| |-- db
| | |-- 000003.log
| | |-- CURRENT
| | |-- LOCK
| | `-- MANIFEST-000002
| `-- index.desperate_curie
`-- history
3 directories, 6 files
Create a job defintion named nextflow_test.job
that will allocate both worker nodes to run nextflow run hello
.
File nextflow_test.job
:
#!/bin/bash
#SBATCH --job-name=nextflow_test
#SBATCH --output=%A_%a_out.txt
#SBATCH --error=%A_%a_err.txt
#SBATCH -p docker
#SBATCH -n 2
module load java nextflow
# Run your executable
srun nextflow run hello
Run the batch job, check the squeue
and sacct
status as the job runs.
$ sbatch nextflow_test.job
Submitted batch job 2
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 docker nextflow worker R 0:01 2 worker[01-02]
$ sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
2 nextflow_+ docker worker 2 RUNNING 0:0
2.0 nextflow worker 2 RUNNING 0:0
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 docker nextflow worker R 0:06 2 worker[01-02]
$ sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
2 nextflow_+ docker worker 2 RUNNING 0:0
2.0 nextflow worker 2 RUNNING 0:0
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 docker nextflow worker R 0:10 2 worker[01-02]
$ sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
2 nextflow_+ docker worker 2 RUNNING 0:0
2.0 nextflow worker 2 RUNNING 0:0
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 docker nextflow worker R 0:14 2 worker[01-02]
$ sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
2 nextflow_+ docker worker 2 COMPLETED 0:0
2.batch batch worker 1 COMPLETED 0:0
2.0 nextflow worker 2 COMPLETED 0:0
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
When the job completes check the output files
$ ls -1
2_4294967294_err.txt
2_4294967294_out.txt
nextflow_test.job
work
File 2_4294967294_out.txt
:
$ cat 2_4294967294_out.txt
N E X T F L O W ~ version 0.28.0
N E X T F L O W ~ version 0.28.0
Launching `nextflow-io/hello` [intergalactic_lumiere] - revision: d4c9ea84de [master]
Launching `nextflow-io/hello` [mad_lamarr] - revision: d4c9ea84de [master]
[warm up] executor > local
[warm up] executor > local
[87/02d3d1] Submitted process > sayHello (2)
[f1/ae6512] Submitted process > sayHello (3)
[2c/cbb8b7] Submitted process > sayHello (1)
[bd/07859c] Submitted process > sayHello (4)
[f6/1fda34] Submitted process > sayHello (4)
[bc/47a180] Submitted process > sayHello (2)
[42/d96ca6] Submitted process > sayHello (3)
[99/330e9b] Submitted process > sayHello (1)
Hola world!
Ciao world!
Hello world!
Ciao world!
Bonjour world!
Hola world!
Hello world!
Bonjour world!
File 2_4294967294_err.txt
:
$ cat 2_4294967294_err.txt
Contents of work
directory:
$ tree -a work/
work/
|-- 2c
| `-- cbb8b721ae27a019c82b94ca6c8695
| |-- .command.begin
| |-- .command.err
| |-- .command.log
| |-- .command.out
| |-- .command.run
| |-- .command.sh
| `-- .exitcode
|-- 42
| `-- d96ca6f4e8eeba81c3a5d8a753e4cf
| |-- .command.begin
| |-- .command.err
| |-- .command.log
| |-- .command.out
| |-- .command.run
| |-- .command.sh
| `-- .exitcode
|-- 57
| `-- 9ce3ec2c2b5849d73a7c3a3d8c8126
| |-- .command.begin
| |-- .command.err
| |-- .command.log
| |-- .command.out
| |-- .command.run
| |-- .command.sh
| `-- .exitcode
|-- 62
| `-- a1e3cd0c1d7d9ec4fe50e90ceef925
| |-- .command.begin
| |-- .command.err
| |-- .command.log
| |-- .command.out
| |-- .command.run
| |-- .command.sh
| `-- .exitcode
|-- 87
| `-- 02d3d1c6a01e882d543c6c20ea70c9
| |-- .command.begin
| |-- .command.err
| |-- .command.log
| |-- .command.out
| |-- .command.run
| |-- .command.sh
| `-- .exitcode
|-- 92
| `-- c892fae2e088cd3bf11e23305371f8
| |-- .command.begin
| |-- .command.err
| |-- .command.log
| |-- .command.out
| |-- .command.run
| |-- .command.sh
| `-- .exitcode
|-- 99
| |-- 330e9b5ecf22bd99dd7f49718eb225
| | |-- .command.begin
| | |-- .command.err
| | |-- .command.log
| | |-- .command.out
| | |-- .command.run
| | |-- .command.sh
| | `-- .exitcode
| `-- aa69886272fdd7a8d4d09f2e40abe3
| |-- .command.begin
| |-- .command.err
| |-- .command.log
| |-- .command.out
| |-- .command.run
| |-- .command.sh
| `-- .exitcode
|-- bc
| `-- 47a1809a2198ea57886d4c3f823517
| |-- .command.begin
| |-- .command.err
| |-- .command.log
| |-- .command.out
| |-- .command.run
| |-- .command.sh
| `-- .exitcode
|-- bd
| `-- 07859c04437f7bc6833bc09aaf9d48
| |-- .command.begin
| |-- .command.err
| |-- .command.log
| |-- .command.out
| |-- .command.run
| |-- .command.sh
| `-- .exitcode
|-- f1
| `-- ae6512ad0f1ced8646c3bcbd4452da
| |-- .command.begin
| |-- .command.err
| |-- .command.log
| |-- .command.out
| |-- .command.run
| |-- .command.sh
| `-- .exitcode
`-- f6
`-- 1fda341384faf7760f06da44a11fe9
|-- .command.begin
|-- .command.err
|-- .command.log
|-- .command.out
|-- .command.run
|-- .command.sh
`-- .exitcode
23 directories, 84 files
Also note that new log files have been created.
$ ls -alh $HOME | grep .nextflow
-rw-rw-r-- 1 worker worker 6.6K Apr 22 18:13 .nextflow.log
-rw-rw-r-- 1 worker worker 6.7K Apr 22 18:13 .nextflow.log.1
-rw-rw-r-- 1 worker worker 7.0K Apr 22 17:53 .nextflow.log.2