Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create & use virtual BF cluster #1

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ RUN set -ex \
lua \
lua-devel \
lua-libs \
lua-posix \
automake \
libtool \
&& yum clean all \
&& rm -rf /var/cache/yum

Expand All @@ -49,13 +52,11 @@ RUN set -ex \
&& wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" \
&& wget -O /usr/local/bin/gosu.asc "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64.asc" \
&& export GNUPGHOME="$(mktemp -d)" \
&& gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4 \
&& gpg --batch --verify /usr/local/bin/gosu.asc /usr/local/bin/gosu \
&& rm -rf "${GNUPGHOME}" /usr/local/bin/gosu.asc \
&& chmod +x /usr/local/bin/gosu \
&& gosu nobody true

RUN set -x \
&& ln -s /usr/lib64/pkgconfig/lua.pc /usr/lib64/pkgconfig/lua5.4.pc \
&& git clone -b ${SLURM_TAG} --single-branch --depth=1 https://github.com/SchedMD/slurm.git \
&& pushd slurm \
&& ./configure --enable-debug --prefix=/usr --sysconfdir=/etc/slurm \
Expand All @@ -66,7 +67,6 @@ RUN set -x \
&& install -D -m644 etc/slurmdbd.conf.example /etc/slurm/slurmdbd.conf.example \
&& install -D -m644 contribs/slurm_completion_help/slurm_completion.sh /etc/profile.d/slurm_completion.sh \
&& popd \
&& rm -rf slurm \
&& groupadd -r --gid=990 slurm \
&& useradd -r -g slurm --uid=990 slurm \
&& mkdir /etc/sysconfig/slurm \
Expand Down Expand Up @@ -96,6 +96,14 @@ RUN set -x \
&& chown slurm:slurm /etc/slurm/slurmdbd.conf \
&& chmod 600 /etc/slurm/slurmdbd.conf

RUN echo $'\n\
function slurm_job_submit(job_desc, part_list, submit_uid)\n\
return slurm.SUCCESS\n\
end\n\
\n\
function slurm_job_modify(job_desc, job_rec, part_list, modify_uid)\n\
return slurm.SUCCESS\n\
end' > /etc/slurm/job_submit.lua

COPY docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]
Expand Down
14 changes: 12 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ sudo docker build -t slurm-docker-cluster:21.08.6 .

### Starting the Cluster

First update any system specific paths from the `docker-compose.yml` file to reflect your local configuration.
Run `docker-compose` to instantiate the cluster:

```console
Expand Down Expand Up @@ -115,11 +116,20 @@ sudo docker volume rm slurm-docker-cluster_etc_munge slurm-docker-cluster_etc_sl

If you want to change the `slurm.conf` or `slurmdbd.conf` file without a rebuilding you can do so by calling
```console
./update_slurmfiles.sh slurm.conf slurmdbd.conf
sudo ./update_slurmfiles.sh slurm.conf slurmdbd.conf
```
(or just one of the files).
The Cluster will automatically be restarted afterwards with
```console
docker-compose restart
sudo docker-compose restart
```
This might come in handy if you add or remove a node to your cluster or want to test a new setting.

# Check the Slurm logs when everything crashes at startup time

If any container exists with an error code, for example because of a bug in a job_submit script, it is possible
to see the Slurm logs from the host. For instance, to see the slurmctld logs, execute the following command:

```console
sudo docker run --rm -i -v=slurm-docker-cluster_var_log_slurm:/tmp/slurm busybox cat /tmp/slurm/slurmctld.log
```
7 changes: 4 additions & 3 deletions slurm.conf
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ MpiDefault=none
SlurmctldPidFile=/var/run/slurmd/slurmctld.pid
SlurmdPidFile=/var/run/slurmd/slurmd.pid
ProctrackType=proctrack/linuxproc
#PluginDir=
PluginDir=/usr/lib64/slurm
JobSubmitPlugins=lua
#CacheGroups=0
#FirstJobId=
ReturnToService=0
Expand Down Expand Up @@ -69,9 +70,9 @@ FastSchedule=1
#PriorityMaxAge=1-0
#
# LOGGING
SlurmctldDebug=3
SlurmctldDebug=9
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=3
SlurmdDebug=9
SlurmdLogFile=/var/log/slurm/slurmd.log
JobCompType=jobcomp/filetxt
JobCompLoc=/var/log/slurm/jobcomp.log
Expand Down