Skip to content

Commit

Permalink
Deployed e7aee5b with MkDocs version: 1.5.3
Browse files Browse the repository at this point in the history
  • Loading branch information
Unknown committed Mar 6, 2024
1 parent 6ac5e44 commit 14e030e
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 27 deletions.
2 changes: 1 addition & 1 deletion search/search_index.json

Large diffs are not rendered by default.

Binary file modified sitemap.xml.gz
Binary file not shown.
66 changes: 40 additions & 26 deletions tursa-user-guide/scheduler/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1590,13 +1590,22 @@ <h3 id="gpu-frequency">GPU frequency</h3>
how Slurm sets the GPU frequency and can be safely ignored.</p>
</div>
<h2 id="srun-launching-parallel-jobs"><code>srun</code>: Launching parallel jobs</h2>
<p>If you are running parallel jobs, your job submission script should contain one or more srun commands to launch the parallel executable across the compute nodes. In most cases you will want to add the options --distribution=block:block and --hint=nomultithread to your srun command to ensure you get the correct pinning of processes to cores on a compute node.</p>
<p>A brief explanation of these options:
- <code>--hint=nomultithread</code> - do not use hyperthreads/SMP
- <code>--distribution=block:block</code> - the first <code>block</code> means use a block distribution
<p>If you are running parallel jobs, your job submission script should contain one or
more srun commands to launch the parallel executable across the compute nodes. In
most cases you will want to add the following options to <code>srun</code>:</p>
<ul>
<li><code>--nodes=[number of nodes]</code> - Set the number of compute nodes for this job step</li>
<li><code>--ntasks-per-node=[MPI processes per node]</code> - This will usually be <code>4</code> for GPU jobs
as you usually have 1 MPI process per GPU</li>
<li><code>--cpus-per-task=[stride between MPI processes]</code> - This will usually be either <code>8</code>
(for A100-40 nodes) or <code>12</code> (for A100-80 nodes). If you are using the <code>gpu</code> QoS
where you can get any type of GPU node, you will usually se this to <code>8</code>.</li>
<li><code>--distribution=block:block</code> - do not use hyperthreads/SMP</li>
<li><code>--hint=nomultithread</code>- the first <code>block</code> means use a block distribution
of processes across nodes (i.e. fill nodes before moving onto the next one) and
the second <code>block</code> means use a block distribution of processes across "sockets"
within a node (i.e. fill a "socket" before moving on to the next one).</p>
within a node (i.e. fill a "socket" before moving on to the next one).</li>
</ul>
<div class="admonition important">
<p class="admonition-title">Important</p>
<p>The Slurm definition of a "socket" does not usually correspond to a physical CPU socket.
Expand All @@ -1606,24 +1615,29 @@ <h2 id="srun-launching-parallel-jobs"><code>srun</code>: Launching parallel jobs
CPU socket (64 cores) as the COU nodes are configured with NPS1.</p>
</div>
<h2 id="example-job-submission-scripts">Example job submission scripts</h2>
<p>The typical strategy for submitting josb on Tursa is for the batch script to
request full nodes with no process/thread pinning and then the individual
<code>srun</code> commands set the correct options for dividing up processes and threads
across nodes.</p>
<h3 id="example-job-submission-script-for-a-parallel-job-using-cuda">Example: job submission script for a parallel job using CUDA</h3>
<p>A job submission script for a parallel job that uses 4 compute nodes, 4 MPI
processes per node and 4 GPUs per node. It does not restrict what type of
GPU the job can run on so both A100-40 and A100-80 can be used:</p>
GPU the job can run on so both A100-40 and A100-80 can be used.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/bash</span>

<span class="c1"># Slurm job options (job-name, compute nodes, job time)</span>
<span class="kp">#SBATCH --job-name=Example_Grid_job</span>
<span class="c1"># Slurm job options</span>
<span class="kp">#SBATCH --job-name=Example_MPI_job</span>
<span class="kp">#SBATCH --time=12:0:0</span>
<span class="kp">#SBATCH --nodes=4</span>
<span class="kp">#SBATCH --tasks-per-node=4</span>
<span class="kp">#SBATCH --cpus-per-task=8</span>
<span class="kp">#SBATCH --gres=gpu:4</span>
<span class="kp">#SBATCH --partition=gpu</span>
<span class="kp">#SBATCH --qos=gpu</span>

<span class="c1"># Replace [budget code] below with your budget code (e.g. t01)</span>
<span class="kp">#SBATCH --account=[budget code] </span>
<span class="kp">#SBATCH --account=[budget code] </span>

<span class="c1"># Request right number of full nodes (32 cores by node fits any GPU compute nodes))</span>
<span class="kp">#SBATCH --nodes=4</span>
<span class="kp">#SBATCH --ntasks-per-node=32</span>
<span class="kp">#SBATCH --cpus-per-task=1</span>
<span class="kp">#SBATCH --gres=gpu:4</span>

<span class="c1"># Load the correct modules</span>
module<span class="w"> </span>load<span class="w"> </span>/home/y07/shared/tursa-modules/setup-env
Expand All @@ -1633,18 +1647,17 @@ <h3 id="example-job-submission-script-for-a-parallel-job-using-cuda">Example: jo

<span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">8</span>
<span class="nb">export</span><span class="w"> </span><span class="nv">OMP_PLACES</span><span class="o">=</span>cores
<span class="nb">export</span><span class="w"> </span><span class="nv">SRUN_CPUS_PER_TASK</span><span class="o">=</span><span class="nv">$SLURM_CPUS_PER_TASK</span>

<span class="c1"># These will need to be changed to match the actual application you are running</span>
<span class="nv">application</span><span class="o">=</span><span class="s2">&quot;my_mpi_openmp_app.x&quot;</span>
<span class="nv">options</span><span class="o">=</span><span class="s2">&quot;arg 1 arg2&quot;</span>

<span class="c1"># We have reserved the full nodes, now distribute the processes as</span>
<span class="c1"># required: 4 MPI processes per node, stride of 12 cores between </span>
<span class="c1"># required: 4 MPI processes per node, stride of 8 cores between </span>
<span class="c1"># MPI processes</span>
<span class="c1"># </span>
<span class="c1"># Note use of gpu_launch.sh wrapper script for GPU and NIC pinning </span>
<span class="nb">srun</span><span class="w"> </span>--nodes<span class="o">=</span><span class="m">4</span><span class="w"> </span>--tasks-per-node<span class="o">=</span><span class="m">4</span><span class="w"> </span>--cpus-per-task<span class="o">=</span><span class="m">12</span><span class="w"> </span><span class="se">\</span>
<span class="nb">srun</span><span class="w"> </span>--nodes<span class="o">=</span><span class="m">4</span><span class="w"> </span>--ntasks-per-node<span class="o">=</span><span class="m">4</span><span class="w"> </span>--cpus-per-task<span class="o">=</span><span class="m">8</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--hint<span class="o">=</span>nomultithread<span class="w"> </span>--distribution<span class="o">=</span>block:block<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>gpu_launch.sh<span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="si">${</span><span class="nv">application</span><span class="si">}</span><span class="w"> </span><span class="si">${</span><span class="nv">options</span><span class="si">}</span>
Expand Down Expand Up @@ -1698,18 +1711,19 @@ <h2 id="using-the-dev-qos">Using the <code>dev</code> QoS</h2>
binding.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/bash</span>

<span class="c1"># Slurm job options (job-name, compute nodes, job time)</span>
<span class="kp">#SBATCH --job-name=Example_Dev_Job</span>
<span class="c1"># Slurm job options</span>
<span class="kp">#SBATCH --job-name=Example_MPI_job</span>
<span class="kp">#SBATCH --time=12:0:0</span>
<span class="kp">#SBATCH --nodes=2</span>
<span class="kp">#SBATCH --tasks-per-node=48</span>
<span class="kp">#SBATCH --cpus-per-task=</span>
<span class="kp">#SBATCH --gres=gpu:4</span>
<span class="kp">#SBATCH --partition=gpu-a100-80</span>
<span class="kp">#SBATCH --qos=dev</span>

<span class="c1"># Replace [budget code] below with your budget code (e.g. t01)</span>
<span class="kp">#SBATCH --account=[budget code]</span>
<span class="kp">#SBATCH --account=[budget code] </span>

<span class="c1"># Request right number of full nodes (48 cores by node for A100-80 GPU nodes))</span>
<span class="kp">#SBATCH --nodes=4</span>
<span class="kp">#SBATCH --ntasks-per-node=48</span>
<span class="kp">#SBATCH --cpus-per-task=1</span>
<span class="kp">#SBATCH --gres=gpu:4</span>

<span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">1</span>
<span class="nb">export</span><span class="w"> </span><span class="nv">OMP_PLACES</span><span class="o">=</span>cores
Expand All @@ -1729,7 +1743,7 @@ <h2 id="using-the-dev-qos">Using the <code>dev</code> QoS</h2>
<span class="c1"># MPI processes</span>
<span class="c1"># </span>
<span class="c1"># Note use of gpu_launch.sh wrapper script for GPU and NIC pinning </span>
<span class="nb">srun</span><span class="w"> </span>--nodes<span class="o">=</span><span class="m">2</span><span class="w"> </span>--tasks-per-node<span class="o">=</span><span class="m">4</span><span class="w"> </span>--cpus-per-task<span class="o">=</span><span class="m">12</span><span class="w"> </span><span class="se">\</span>
<span class="nb">srun</span><span class="w"> </span>--nodes<span class="o">=</span><span class="m">4</span><span class="w"> </span>--ntasks-per-node<span class="o">=</span><span class="m">4</span><span class="w"> </span>--cpus-per-task<span class="o">=</span><span class="m">12</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--hint<span class="o">=</span>nomultithread<span class="w"> </span>--distribution<span class="o">=</span>block:block<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>gpu_launch.sh<span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="si">${</span><span class="nv">application</span><span class="si">}</span><span class="w"> </span><span class="si">${</span><span class="nv">options</span><span class="si">}</span>
Expand Down

0 comments on commit 14e030e

Please sign in to comment.