Skip to content

Latest commit

 

History

History
99 lines (80 loc) · 4.98 KB

README.md

File metadata and controls

99 lines (80 loc) · 4.98 KB

README

This is the WSM6 Stand-Alone Kernel

Here is a brief recipe for running the stand-alone WSM6 kernel on endeavor.
The kernel reads input state based upon and MPAS model run with real data, executes a single call to the WSM6 microphysics top-level routine "wsm62d()", and writes output state to disk. Built-in timers measure execution time.

NOTE: Due to file system peculiarities on endeavor, these directions only work if WSM6 kernel is installed on the /panfs file system.

NOTE: These instructions assume that "." is in your path.

  1. All commands below use tcsh. ">>" is the command-line prompt on a compile node (like ecompile2). "$" is the command-line prompt on a KNC card (like ehs154-mic0).

  2. Grab the tarfile and stuff it into your working directory. In this case the directory is "/panfs/users/${USER}/WSM6/":

cd /panfs/users/${USER}/WSM6/ cp /panfs/users/Xtbhend/WSM6/WSM6kernel.tgz . Note that Fortran source code lives in src/kernel/wsm6_kernel.F90.

  1. Untar:

tar xfz WSM6kernel.tgz

  1. Go to the source directory:

cd WSM6kernel/src

  1. Build the "optimized" case for KNC. The "makewsm6" command sets up any modules (or runs "sourceMe.*sh" scripts) and executes "make" with the appropriate target(s). Arguments mean:
    arch=endeavorxeonphi Build for Endeavor KNC. Make macros for this build configuration can be found in file src/macros.make.endeavorxeonphi.
    threading=yes Turn on OpenMP theading.
    chunk=8 Use horizontal chunk size=8 and translate "chunk" dimension to use literal constants for loop and memory bounds. Units of "chunk" are double-precision words so "chunk=8" matches KNC vector length.
    nz=41 Use vertical loop length=41 and translate vertical dimension to use literal constants for loop and memory bounds.
    fpmp=no Turn off "-fp-model precise".
    Please note that the compiler options used do not include the extra optimizations added by Indraneil and Ashish yet. (These can be added by adding the $(AGGRESSIVE) flag to the definition of PHYSFLAGS in src/macros.make.endeavorxeonphi if desired.)

makewsm6 arch=endeavorxeonphi threading=yes chunk=8 nz=41 fpmp=no > & ! make.out

  1. Look in file make.out for any build errors. If there are no error, make.out will end with two messages indicating location and name of a script that will run the kernel:
    Run scripts are in: /panfs/users/${USER}/WSM6/WSM6kernel/run_endeavorxeonphi_ompyes_CHUNK8_NZ41_FPMPno Run script is: endeavorsub.mic.serial

  2. Go to the run directory and execute the run script from the compile node:

cd /panfs/users/${USER}/WSM6/WSM6kernel/run_endeavorxeonphi_ompyes_CHUNK8_NZ41_FPMPno endeavorsub.mic.serial Made directory: RUN_real41level_240threads_16727 Execute case 'real41level' from '/panfs/users/${USER}/WSM6/WSM6kernel/run_endeavorxeonphi_ompyes_CHUNK8_NZ41_FPMPno/RUN_real41level_240threads_16727' via script 'runscript' by hand...

  1. The run script creates a subdirectory and places a file called "runscript" in it. In this case the subdirectory is "RUN_real41level_240threads_16727". Launch an interactive session and login to a KNC card. Then execute ./runscript :
    $ cd /panfs/users/${USER}/WSM6/WSM6kernel/run_endeavorxeonphi_ompyes_CHUNK8_NZ41_FPMPno/RUN_real41level_240threads_16727/ $ ./runscript

  2. Look for output files. The kernel should create the following files:
    -rw-r--r-- 1 Xtbhend Xtbhend 519 Oct 6 17:29 timing.summary -rw-r--r-- 1 Xtbhend Xtbhend 99074 Oct 6 17:29 timing.0 -rw-r--r-- 1 Xtbhend Xtbhend 4103 Oct 6 17:29 stdout -rw-r--r-- 1 Xtbhend Xtbhend 24171208 Oct 6 17:29 wsm6_output.dat stdout can be used to validate the run. More on this later.
    timing.summary contains brief timer output of the following form:
    name ncalls wallmax (thred) wallmin (thred) Total 1 2.638 ( 0) 2.638 ( 0) WSM62D+OpenMP 1 0.658 ( 0) 0.658 ( 0) WSM62D 1281 0.158 ( 12) 0.124 ( 0) All times are in seconds.
    "Total" time is total execution time for the kernel including all I/O.
    "WSM62D+OpenMP" time is execution time for the single OpenMP loop that contains only one call to wsm62d(). The loops has 10242/chunk iterations, which is 1281 in this case.
    "WSM62D" time is execution time measured inside the OpenMP loop. This timer is placed around the call to wsm62d(). See source file src/kernel/wsm6_kernel.F90 for details. Note the large difference between the slowest thread of the "WSM62D" timer and the "WSM62D+OpenMP" timer. This is key mystery we must solve.