Skip to content

OSSS UCX

tonycurtis edited this page Jun 12, 2020 · 13 revisions

Implementation Overview

Name: OSSS-UCX
Vendor/Implementor: OSSS/LANL/SBU
Open Source: yes
Website: https://github.com/openshmem-org/osss-ucx
User Guide: https://github.com/openshmem-org/osss-ucx
Version Supported: 1.4
Release Date: 2011
Platforms: x86, ARM, POWER
OS Support: Linux
Transports: see https://github.com/openucx/ucx
SHMEMX support: if requested

User Installation Experiences

Note that I am showing builds of the stack for completeness. Some components are already available on various machines and can be left to be discovered by the configure scripts, for example on Summit, PMIx and UCX are already available for use via module.

Summit at Oak Ridge

Prereqs Not Installed

Component From Version
libevent https://libevent.org/ 2.1.11

Main Bits

Component From Version
PMIx https://github.com/openpmix/openpmix 3.1.4
PRRTE / Open-MPI not needed, use LSF directly
UCX https://github.com/openucx/ucx
OSSS-UCX https://github.com/openshmem-org/osss-ucx

Build

N.B. the configure commands here are all run from a separate build directory created as sibling of the source. All of these imply make and make install of course, and also an autogen for git-clones.

libevent

#!/bin/sh

../libevent-2.1.11-stable/configure \
    --prefix=$HOME/opt/libevent/2.1.11 \
    --disable-samples \
    --disable-debug-mode

PMIx

LSF seems to be averse to PMIx > 3.1.4 so go with that. Also seems to work with the PMIx 3.1.4 in Spectrum MPI (IBM rebrand of Open-MPI).

#!/bin/sh

../pmix-3.1.4-source/configure \
    --prefix=$HOME/opt/pmix/3.1.4 \
    --disable-debug \
    --with-libevent=$HOME/opt/libevent/2.1.11

UCX

"knem" throws warnings during execution, suspect related to CUDA memory, so disabling for now (investigating). I get (known) errors during compilation with XL compilers (https://www.ibm.com/support/pages/apar/LI74419), so falling back to GCC for now.

#!/bin/sh

../source/configure \
    --prefix=$HOME/opt/ucx/git \
    --enable-mt \
    --enable-optimizations \
    --enable-cma \
    --without-knem \
    --without-cuda --without-java

OSSS-UCX

#!/bin/sh

../source/configure \
    --prefix=$HOME/opt/osss-ucx \
    --with-pmix=$HOME/opt/pmix/3.1.4 \
    --with-ucx=$HOME/opt/ucx/git

Test Installation

$ PATH=$HOME/opt/osss-ucx/bin:$PATH

$ which oshcc
~/opt/osss-ucx/bin/oshcc

$ osh_info
# OpenSHMEM Package name:      osss-ucx
# OpenSHMEM Package version:   1.0
...
# Using UCX from:              /ccs/home/tonyc/opt/ucx/git
# UCX Build Version:           1.9
# Using PMIx from:             /ccs/home/tonyc/opt/pmix/3.1.4
# PMIx Build Version:          3.1.4
...

Running Programs

Summit's LSF has the launcher jsrun which is PMIx-aware, so we can launch directly. Here I request a 2-node interactive job, then use 2 cores-per-node in my run (to keep the output short).

login$ oshcc helloworld.c
login$ bsub -Is -q batch -W 2:00 -nnodes 2 -P $project /bin/bash
... wait for allocation...
batch$ jsrun -r 2 ./a.out
h22n13: Hello from PE    3 of    4
h22n13: Hello from PE    2 of    4
h22n12: Hello from PE    1 of    4
h22n12: Hello from PE    0 of    4

Frontera at TACC

Main Bits

Component From
PMIx https://github.com/openpmix/openpmix
PRRTE https://github.com/openpmix/prrte
UCX https://github.com/openucx/ucx
OSSS-UCX https://github.com/openshmem-org/osss-ucx

Build

N.B. the configure commands here are all run from a separate build directory created as sibling of the source. All of these imply make and make install of course, and also an autogen for git-clones.

PMIx

#!/bin/sh

../pmix-source/configure \
    --prefix=$HOME/opt/pmix/git \
    --disable-debug

PRRTE

#!/bin/sh

../prrte-source/configure \
    --prefix=$HOME/opt/prrte/git \
    --with-pmix=$HOME/opt/pmix/git \
    --disable-debug

UCX

#!/bin/sh

../source/configure \
    --prefix=$HOME/opt/ucx/git \
    --enable-mt \
    --enable-optimizations \
    --enable-cma \
    --without-cuda --without-java

OSSS-UCX

#!/bin/sh

../source/configure \
    --prefix=$HOME/opt/osss-ucx \
    --with-pmix=$HOME/opt/pmix/git \
    --with-ucx=$HOME/opt/ucx/git

Test Installation

$ PATH=$HOME/opt/prrte/git/bin:$PATH
$ PATH=$HOME/opt/osss-ucx/bin:$PATH

$ which oshcc
~/opt/osss-ucx/bin/oshcc

$ osh_info
# OpenSHMEM Package name:      osss-ucx
# OpenSHMEM Package version:   1.0
...
# Using UCX from:              /home1/01858/arcurtis/opt/ucx/git
# UCX Build Version:           1.9
# Using PMIx from:             /home1/01858/arcurtis/opt/pmix/git
# PMIx Build Version:          4.0.0
...

Running Programs

Frontera is SLURM-based. Can't get the PMIx plugin to play nicely like on stretch. Can launch interactively with their utility script idev.

login$ oshcc helloworld.c
login$ idev -p development -t 0:5:00 -N 2 --ntasks-per-node=2
... wait for allocation...
c161-001[3](~/shmem/openshmem-examples/c) oshrun ./a.out
oshrun:== OSSS-UCX Python-based Launcher ==
oshrun:init:looking for launcher
oshrun:init:checking for DVM...
oshrun:prte:starting DVM
oshrun:prte:DVM says "DVM ready"
oshrun:prte:talking with DVM "prte"
oshrun:running "prun -x 'S{HMEM,MA}_*' ./a.out"
oshrun:----------------------------------------------------------------------
c161-001.frontera.tacc.utexas.edu: Hello from PE    0 of    4
c161-002.frontera.tacc.utexas.edu: Hello from PE    2 of    4
c161-001.frontera.tacc.utexas.edu: Hello from PE    1 of    4
c161-002.frontera.tacc.utexas.edu: Hello from PE    3 of    4
oshrun:prte:killing DVM pid 456269
Clone this wiki locally