-
Notifications
You must be signed in to change notification settings - Fork 0
OSSS UCX
Name: | OSSS-UCX |
Vendor/Implementor: | OSSS/LANL/SBU |
Open Source: | yes |
Website: | https://github.com/openshmem-org/osss-ucx |
User Guide: | https://github.com/openshmem-org/osss-ucx |
Version Supported: | 1.4 |
Release Date: | 2011 |
Platforms: | x86, ARM, POWER |
OS Support: | Linux |
Transports: | see https://github.com/openucx/ucx |
SHMEMX support: | if requested |
Note that I am showing builds of the stack for completeness. Some components are already available on various machines and can be left to be discovered by the configure
scripts, for example on Summit, PMIx and UCX are already available for use via module
.
Component | From | Version |
---|---|---|
libevent | https://libevent.org/ | 2.1.11 |
Component | From | Version |
---|---|---|
PMIx | https://github.com/openpmix/openpmix | 3.1.4 |
PRRTE / Open-MPI | not needed, use LSF directly | |
UCX | https://github.com/openucx/ucx | |
OSSS-UCX | https://github.com/openshmem-org/osss-ucx |
N.B. the configure commands here are all run from a separate build directory created as sibling of the source. All of these imply make
and make install
of course, and also an autogen
for git-clones.
#!/bin/sh
../libevent-2.1.11-stable/configure \
--prefix=$HOME/opt/libevent/2.1.11 \
--disable-samples \
--disable-debug-mode
LSF seems to be averse to PMIx > 3.1.4 so go with that. Also seems to work with the PMIx 3.1.4 in Spectrum MPI (IBM rebrand of Open-MPI).
#!/bin/sh
../pmix-3.1.4-source/configure \
--prefix=$HOME/opt/pmix/3.1.4 \
--disable-debug \
--with-libevent=$HOME/opt/libevent/2.1.11
"knem" throws warnings during execution, suspect related to CUDA memory, so disabling for now (investigating). I get (known) errors during compilation with XL compilers (https://www.ibm.com/support/pages/apar/LI74419), so falling back to GCC for now.
#!/bin/sh
../source/configure \
--prefix=$HOME/opt/ucx/git \
--enable-mt \
--enable-optimizations \
--enable-cma \
--without-knem \
--without-cuda --without-java
#!/bin/sh
../source/configure \
--prefix=$HOME/opt/osss-ucx \
--with-pmix=$HOME/opt/pmix/3.1.4 \
--with-ucx=$HOME/opt/ucx/git
$ PATH=$HOME/opt/osss-ucx/bin:$PATH
$ which oshcc
~/opt/osss-ucx/bin/oshcc
$ osh_info
# OpenSHMEM Package name: osss-ucx
# OpenSHMEM Package version: 1.0
...
# Using UCX from: /ccs/home/tonyc/opt/ucx/git
# UCX Build Version: 1.9
# Using PMIx from: /ccs/home/tonyc/opt/pmix/3.1.4
# PMIx Build Version: 3.1.4
...
Summit's LSF has the launcher jsrun
which is PMIx-aware, so we can launch directly. Here I request a 2-node interactive job, then use 2 cores-per-node in my run (to keep the output short).
login$ oshcc helloworld.c
login$ bsub -Is -q batch -W 2:00 -nnodes 2 -P $project /bin/bash
... wait for allocation...
batch$ jsrun -r 2 ./a.out
h22n13: Hello from PE 3 of 4
h22n13: Hello from PE 2 of 4
h22n12: Hello from PE 1 of 4
h22n12: Hello from PE 0 of 4
None.
Component | From | Version |
---|---|---|
PMIx | https://github.com/openpmix/openpmix | git |
PRRTE / Open-MPI | not needed, use SLURM directly | |
UCX | https://github.com/openucx/ucx | |
OSSS-UCX | https://github.com/openshmem-org/osss-ucx |
N.B. the configure commands here are all run from a separate build directory created as sibling of the source. All of these imply make
and make install
of course, and also an autogen
for git-clones.
#!/bin/sh
../pmix-3.1.4-source/configure \
--prefix=$HOME/opt/pmix/git \
--disable-debug
Some issue with DC being attempted, very noisy. Have disabled.
#!/bin/sh
../source/configure \
--prefix=$HOME/opt/ucx/git \
--enable-mt \
--enable-optimizations \
--enable-cma \
--without-knem \
--without-cuda --without-java \
--without-dc
#!/bin/sh
../source/configure \
--prefix=$HOME/opt/osss-ucx \
--with-pmix=$HOME/opt/pmix/git \
--with-ucx=$HOME/opt/ucx/git
$ PATH=$HOME/opt/osss-ucx/bin:$PATH
$ which oshcc
~/opt/osss-ucx/bin/oshcc
$ osh_info
# OpenSHMEM Package name: osss-ucx
# OpenSHMEM Package version: 1.0
...
# Using UCX from: /ccs/home/tonyc/opt/ucx/git
# UCX Build Version: 1.10
# Using PMIx from: /ccs/home/tonyc/opt/pmix/git
# PMIx Build Version: git
...
Wombat runs SLURM, with the PMIx plugin. So I can set the environment variable SLURM_MPI_TYPE
to pmix
.
login$ export SLURM_MPI_TYPE=pmix
login$ oshcc helloworld.c
login$ srun -n 4 --ntasks-per-node=2 [or say --mpi=pmix] ./a.out
wombat8: Hello from PE 2 of 4
wombat8: Hello from PE 3 of 4
wombat7: Hello from PE 0 of 4
wombat7: Hello from PE 1 of 4
Component | From |
---|---|
PMIx | https://github.com/openpmix/openpmix |
PRRTE | https://github.com/openpmix/prrte |
UCX | https://github.com/openucx/ucx |
OSSS-UCX | https://github.com/openshmem-org/osss-ucx |
N.B. the configure commands here are all run from a separate build directory created as sibling of the source. All of these imply make
and make install
of course, and also an autogen
for git-clones.
#!/bin/sh
../pmix-source/configure \
--prefix=$HOME/opt/pmix/git \
--disable-debug
#!/bin/sh
../prrte-source/configure \
--prefix=$HOME/opt/prrte/git \
--with-pmix=$HOME/opt/pmix/git \
--disable-debug
#!/bin/sh
../source/configure \
--prefix=$HOME/opt/ucx/git \
--enable-mt \
--enable-optimizations \
--enable-cma \
--without-cuda --without-java
#!/bin/sh
../source/configure \
--prefix=$HOME/opt/osss-ucx \
--with-pmix=$HOME/opt/pmix/git \
--with-ucx=$HOME/opt/ucx/git
$ PATH=$HOME/opt/prrte/git/bin:$PATH
$ PATH=$HOME/opt/osss-ucx/bin:$PATH
$ which oshcc
~/opt/osss-ucx/bin/oshcc
$ osh_info
# OpenSHMEM Package name: osss-ucx
# OpenSHMEM Package version: 1.0
...
# Using UCX from: /home1/01858/arcurtis/opt/ucx/git
# UCX Build Version: 1.9
# Using PMIx from: /home1/01858/arcurtis/opt/pmix/git
# PMIx Build Version: 4.0.0
...
Frontera is SLURM-based. Can't get the PMIx plugin to play nicely like on stretch. Can launch interactively with their utility script idev
.
login$ oshcc helloworld.c
login$ idev -p development -t 0:5:00 -N 2 --ntasks-per-node=2
... wait for allocation...
c161-001[3](~/shmem/openshmem-examples/c) oshrun ./a.out
oshrun:== OSSS-UCX Python-based Launcher ==
oshrun:init:looking for launcher
oshrun:init:checking for DVM...
oshrun:prte:starting DVM
oshrun:prte:DVM says "DVM ready"
oshrun:prte:talking with DVM "prte"
oshrun:running "prun -x 'S{HMEM,MA}_*' ./a.out"
oshrun:----------------------------------------------------------------------
c161-001.frontera.tacc.utexas.edu: Hello from PE 0 of 4
c161-002.frontera.tacc.utexas.edu: Hello from PE 2 of 4
c161-001.frontera.tacc.utexas.edu: Hello from PE 1 of 4
c161-002.frontera.tacc.utexas.edu: Hello from PE 3 of 4
oshrun:prte:killing DVM pid 456269