Removes incorrect wrapper script

EPCCed · Feb 29, 2024 · 889883d · 889883d
1 parent db0a54b
commit 889883d
Show file tree

Hide file tree

Showing 2 changed files with 40 additions and 35 deletions.
diff --git a/docs/tursa-user-guide/hardware.md b/docs/tursa-user-guide/hardware.md
@@ -0,0 +1,36 @@
+# ARCHER2 hardware
+
+!!! note
+    Some of the material in this section is closely based on [information provided by NASA](https://www.nas.nasa.gov/hecc/support/kb/amd-rome-processors_658.html) as part of the documentation for the [Aitkin HPC system](https://www.nas.nasa.gov/hecc/resources/aitken.html).
+
+## System overview
+
+Tursa is a Eviden supercomputing system which has a total of 178 GPU compute nodes. Each GPU compute node has a CPU with 48 cores and 4 NVIDIA A100 GPU. Compute nodes are connected together by an Infiniband interconnect. 
+
+There are additional login nodes, which provide access to the system.
+
+Compute nodes are only accessible via the Slurm job scheduling system.
+
+There is a single file system which is available on login and compute nodes (see [Data management and transfer](data.md)).
+
+The Lustre file system has a capacity of 5.1 PiB.
+
+The interconnect uses a Fat Tree topology.
+
+## Interconnect details
+
+Tursa has a high performance interconnect with 4x 200 Gb/s infiniband interfaces per node. It uses a 2-layer fat tree topology:
+
+- Each node connects to 4 of the 5 L1 (leaf) switches within the same cabinet with 200 Gb/s links
+- Within an 8-node block, all nodes share the same 4 switches
+- Each L1 switch connects to all 20 L2 switches via 200 Gb/s links - leading maximum of 2 switch to switch hops to get between any 2 nodes
+- There are no direct L1 to L1 or L2 to L2 switch connections
+- 16-node, 32-node and 64-node blocks are constructed from 8-node blocks that show the required performance on the inter-block links
+
+
+
+
+
+
+
+
diff --git a/docs/tursa-user-guide/scheduler.md b/docs/tursa-user-guide/scheduler.md
@@ -491,9 +491,9 @@ across the compute nodes. You will usually add the following options to
 
 ## Example job submission scripts
 
-### Example: job submission script for Grid parallel job using CUDA
+### Example: job submission script for a parallel job using CUDA
 
-A job submission script for a Grid job that uses 4 compute nodes, 16 MPI
+A job submission script for a parallel job that uses 4 compute nodes, 16 MPI
 processes per node and 4 GPUs per node. It does not restrict what type of
 GPU the job can run on so both A100-40 and A100-80 can be used:
 
@@ -540,48 +540,17 @@ export OMPI_MCA_btl_openib_if_exclude=mlx5_1,mlx5_2,mlx5_3
 application="my_mpi_openmp_app.x"
 options="arg 1 arg2"
 
-mpirun -np $SLURM_NTASKS --map-by numa -x LD_LIBRARY_PATH --bind-to none ./wrapper.sh ${application} ${options}
+mpirun -np $SLURM_NTASKS --map-by numa -x LD_LIBRARY_PATH --bind-to none ${application} ${options}
 ```
 
-This will run your executable "grid" in parallel usimg 16
+This will run your executable "my_mpi_opnemp_app.x" in parallel usimg 16
 MPI processes on 4 nodes, 8 OpenMP thread will be used per
 MPI process and 4 GPUs will be used per node (32 cores per
 node, 4 GPUs per node). Slurm will allocate 4 nodes to your
 job and srun will place 4 MPI processes on each node.
 
-When running on Tursa it is important that we specify how
-each of the GPU's interacts with the network interfaces to
-reach optimal network communication performance. To achieve
-this, we introduce a wrapper script (specified as `wrapper.sh`
-in the example job script above) that sets a number of
-environment parameters for each rank in a node (each GPU
-in a node) explicitly tell each rank which network interface
-it should use to communicate internode.
-
-`wrapper.sh` script example:
-
-```
-#!/bin/bash
-
-
-lrank=$OMPI_COMM_WORLD_LOCAL_RANK
-numa1=$(( 2 * $lrank))
-numa2=$(( 2 * $lrank + 1 ))
-netdev=mlx5_${lrank}:1
-
-export CUDA_VISIBLE_DEVICES=$OMPI_COMM_WORLD_LOCAL_RANK
-export UCX_NET_DEVICES=mlx5_${lrank}:1
-BINDING="--interleave=$numa1,$numa2"
-
-echo "`hostname` - $lrank device=$CUDA_VISIBLE_DEVICES binding=$BINDING"
-
-numactl ${BINDING}  $*
-```
-
 See above for a more detailed discussion of the different `sbatch` options.
 
-options
-
 ## Using the `dev` QoS
 
 The `dev` QoS is designed for faster turnaround of short jobs than is usually available through