Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Support async execution in OpenMP wherever it's supported #725

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

iomaganaris
Copy link
Contributor

@iomaganaris iomaganaris commented Dec 21, 2021

Description

Added asynchronous execution of kernels in multiple GPU streams.
Still there are some constructs that the compiler doesn't support:

  1. #pragma omp target update to(<variable>) depend(inout: stream) nowait is not working, even if update from is working. There is an internal compiler error whenever depend(..) nowait is added to the to clause.
coreneuron::nrn_fixed_step_lastpart(coreneuron::NrnThread *):
    386, Taskwait
         Generating update to(nth->_t)
/gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/pulls/1392/deploy/externals/2021-12-10/linux-rhel7-x86_64/gcc-9.3.0/nvhpc-21.11-qhk3q2/Linux_x86_64/21.11/compilers/share/llvm/bin/opt: /gpfs/bbp.cscs.ch/ssd/slurmTmpFS/magkanar/140832/nvc++xfYwftRovDgD.ll:144924:43: error: use of undefined value '%.d0009.addr'
        %41 = bitcast [1 x %struct.struct_deps]* %.d0009.addr to i8*, !dbg !120921
  1. #pragma omp taskwait depend(inout: stream) is not working even if it's referenced in an NVIDIA presentation

How to test this?

module load unstable
module load cmake git flex bison python-dev hpe-mpi/2.25.hmpt
module unload hpe-mpi/2.22.hmpt py-mpi4py
module load caliper
module unload cuda/11.0.2
module load gcc
module load boost
module use /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/pulls/1392/deploy/compilers/2021-12-10/modules/tcl/linux-rhel7-x86_64
module use /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/pulls/1392/deploy/externals/2021-12-10/modules/tcl/linux-rhel7-x86_64
module load nvhpc/21.11 cuda/11.5.1
cmake .. \
          -DCMAKE_INSTALL_PREFIX=./install \
          -DCORENRN_ENABLE_TIMEOUT=OFF \
          -DNRN_ENABLE_INTERVIEWS=OFF \
          -DNRN_ENABLE_RX3D=OFF \
          -DNRN_ENABLE_MPI=ON \
          -DCORENRN_ENABLE_OPENMP=ON \
          -DNRN_ENABLE_CORENEURON=ON \
          -DCORENRN_ENABLE_GPU=ON \
          -DCORENRN_ENABLE_NMODL=ON \
          -DCORENRN_NMODL_DIR=<nmodl_dir> \
          -DNRN_ENABLE_PYTHON=ON \
          -DPYTHON_EXECUTABLE=$(which python3) \
          -DNRN_ENABLE_TESTS=OFF \
          -DCORENRN_ENABLE_UNIT_TESTS=OFF \
          -DCMAKE_C_COMPILER=$CC \
          -DCMAKE_CXX_COMPILER=$CXX \
          -DCMAKE_CUDA_COMPILER=nvcc \
          -DCMAKE_BUILD_TYPE=RelWithDebInfo \
          -DCORENRN_ENABLE_CALIPER_PROFILING=ON \
          -DCORENRN_ENABLE_OPENMP_OFFLOAD=ON \
          -DCMAKE_CXX_FLAGS="-Minfo=accel -gopt -tp=skylake-avx512"
cmake --build . --parallel 40 --target install

Test System

  • OS: RedHat
  • Compiler: NVHPC 21.11
  • Version: hackathon_main
  • Backend: GPU

@olupton olupton force-pushed the magkanar/hackathon_openmp_async branch from 4f6675e to 6b90913 Compare December 23, 2021 11:20
@olupton olupton changed the base branch from hackathon_main to master December 23, 2021 11:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants