diff --git a/search/search_index.json b/search/search_index.json
index 4726c00..d0f6029 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"DiRAC Extreme Scaling User Documentation","text":"<p>DiRAC Extreme Scaling is part of the DiRAC National HPC Service. You can find more information on the service and the research it supports on the DiRAC website.</p> <p>The DiRAC Extreme Scaling service is an HPC resource for UK researchers. DiRAC Extreme Scaling is provided by UKRI, EPCC and the University of Edinburgh. The hardware is provided by ATOS.</p>"},{"location":"#what-the-documentation-covers","title":"What the documentation covers","text":"<p>The documentation currently includes:</p> <ul> <li>Tursa User Guide     Covers all aspects of use of the Tursa resource.     This includes fundamentals (required by all users to use the system     effectively), best practice for getting the most out of Tursa, and     other advanced technical topics.</li> </ul>"},{"location":"#contributing-to-the-documentation","title":"Contributing to the documentation","text":"<p>The source for this documentation is publicly available in the DiRAC Extreme Scaling documentation Github repository so that anyone can contribute to improve the documentation for the service. Contributions can be in the form of improvements or addtions to the content and/or addtion of Issues providing suggestions for how it can be improved.</p> <p>Full details of how to contribute can be found in the <code>README.md</code> file of the repository.</p>"},{"location":"tursa-user-guide/","title":"Tursa User Guide","text":"<p>The Tursa User Guide covers all aspects of use of the Tursa resource. This includes fundamentals (required by all users to use the system effectively), best practice for getting the most out of Tursa and more technical topics.</p> <p>The Tursa User Guide contains the following sections:</p> <ul> <li>Connecting to Tursa</li> <li>Data management and transfer</li> <li>Software environment</li> <li>Running jobs on Tursa</li> </ul>"},{"location":"tursa-user-guide/connecting/","title":"Connecting to Tursa","text":"<p>On the Tursa system, interactive access can be achieved via SSH, either directly from a command line terminal or using an SSH client. In addition data can be transferred to and from the Tursa system using <code>scp</code> from the command line or by using a file transfer client.</p> <p>This section covers the basic connection methods.</p> <p>Before following the process below, we assume you have setup an account on Tursa through the DiRAC SAFE. Documentation on how to do this can be found at:</p> <ul> <li>SAFE Guide for Users</li> </ul>"},{"location":"tursa-user-guide/connecting/#command-line-terminal","title":"Command line terminal","text":""},{"location":"tursa-user-guide/connecting/#linux","title":"Linux","text":"<p>Linux distributions come installed with a terminal application that can be used for SSH access to the login nodes. Linux users will have different terminals depending on their distribution and window manager (e.g. GNOME Terminal in GNOME, Konsole in KDE). Consult your Linux distribution's documentation for details on how to load a terminal.</p>"},{"location":"tursa-user-guide/connecting/#macos","title":"MacOS","text":"<p>MacOS users can use the Terminal application, located in the Utilities folder within the Applications folder.</p>"},{"location":"tursa-user-guide/connecting/#windows","title":"Windows","text":"<p>A typical Windows installation will not include a terminal client, though there are various clients available. We recommend all our Windows users to download and install MobaXterm to access Tursa. It is very easy to use and includes an integrated X server with SSH client to run any graphical applications on Tursa.</p> <p>You can download MobaXterm Home Edition (Installer Edition) from the following link:</p> <ul> <li>Install MobaXterm</li> </ul> <p>Double-click the downloaded Microsoft Installer file (.msi), and the Windows wizard will automatically guides you through the installation process. Note, you might need to have administrator rights to install on some Windows OS. Also make sure to check whether Windows Firewall hasn't blocked any features of this program after installation.</p> <p>Start MobaXterm and then click \"Start local terminal\"</p> <p>Tips</p> <ul> <li> <p>If you download the .zip file rather than the .msi, make sure you unzip before attempting to run the installer.</p> </li> <li> <p>If you are using a \"managed desktop\" machine, so do not have admin rights, you can use the Portable edition of MobaXterm which doesn't need install privilages.</p> </li> <li> <p>If this is your first time using MobaXterm, check that a permanent /home directory has been set up (or all saved info will be lost from session to session). Go to \"Settings\" -&gt; \"Configuration\"-&gt; check path to \"Persistent home directory\" is set and make sure path is \"private\" if prompted.</p> </li> <li> <p>Any ssh key generated in MobaXterm will, by default, be stored in the permanent /home directory (see above) i.e. if your /home directory is <code>_MyDocuments_\\MobaXterm\\home</code> then within that folder you will find <code>_MyDocuments_\\MobaXterm\\home\\.ssh</code> with your keys.  This folder will be 'hidden' by default so you may need to tick 'Hidden items' under 'View' in Windows Explorer to see it.</p> </li> <li> <p>MobaXterm also allows you to set up ssh sessions with the username, login host and key details saved.  You are welcome to use this, rather than using the \"Local terminal\" but we are not able to assist with debugging connection issues if you choose this method.  We recommend sticking to command line terminal access.</p> </li> </ul>"},{"location":"tursa-user-guide/connecting/#access-credentials","title":"Access credentials","text":"<p>To access Tursa, you need to use two credentials:</p> <ul> <li>Before 13 Feb 2024: your password and an SSH key pair protected by a passphrase.</li> <li>After 13 Feb 2024: An SSH key pair protected by a passphrase and a Time-based One Time Passcode (TOTP)</li> </ul> <p>You can find more detailed instructions on how to set up your credentials to access Tursa from Windows, macOS and Linux below.</p>"},{"location":"tursa-user-guide/connecting/#ssh-key-pairs","title":"SSH Key Pairs","text":"<p>You will need to generate an SSH key pair protected by a passphrase to access Tursa.</p> <p>Using a terminal (the command line), set up a key pair that contains your e-mail address and enter a passphrase you will use to unlock the key:</p> <pre><code>$ ssh-keygen -t rsa -C \"your@email.com\"\n...\n-bash-4.1$ ssh-keygen -t rsa -C \"your@email.com\"\nGenerating public/private rsa key pair.\nEnter file in which to save the key (/Home/user/.ssh/id_rsa): [Enter]\nEnter passphrase (empty for no passphrase): [Passphrase]\nEnter same passphrase again: [Passphrase]\nYour identification has been saved in /Home/user/.ssh/id_rsa.\nYour public key has been saved in /Home/user/.ssh/id_rsa.pub.\nThe key fingerprint is:\n03:d4:c4:6d:58:0a:e2:4a:f8:73:9a:e8:e3:07:16:c8 your@email.com\nThe key's randomart image is:\n+--[ RSA 2048]----+\n|    . ...+o++++. |\n| . . . =o..      |\n|+ . . .......o o |\n|oE .   .         |\n|o =     .   S    |\n|.    +.+     .   |\n|.  oo            |\n|.  .             |\n| ..              |\n+-----------------+\n</code></pre> <p>(remember to replace \"your@email.com\" with your e-mail address).</p>"},{"location":"tursa-user-guide/connecting/#upload-public-part-of-key-pair-to-safe","title":"Upload public part of key pair to SAFE","text":"<p>You should now upload the public part of your SSH key pair to the SAFE by following the instructions at:</p> <p>Login to SAFE. Then:</p> <ol> <li>Go to the Menu Login accounts and select the Tursa account you      want to add the SSH key to</li> <li>On the subsequent Login account details page click the Add      Credential button</li> <li>Select SSH public key as the Credential Type and click Next</li> <li>Either copy and paste the public part of your SSH key into the      SSH Public key box or use the button to select the public key      file on your computer.</li> <li>Click Add to associate the public SSH key part with your account</li> </ol> <p>Once you have done this, your SSH key will be added to your Tursa account.</p> <p>Remember, you will need to use both an SSH key and password to log into Tursa so you will also need to collect your initial password before you can log into Tursa. We cover this next.</p> <p>Note</p> <p>If you want to connect to Tursa from more than one machine, e.g. from your home laptop as well as your work laptop, you should generate an ssh key on each machine, and add each of the public keys into SAFE.  </p>"},{"location":"tursa-user-guide/connecting/#initial-passwords-up-to-13-feb-2024","title":"Initial passwords (up to 13 Feb 2024)","text":"<p>The SAFE web interface is used to provide your initial password for logging onto Tursa (see the SAFE Documentation for more details on requesting accounts and picking up passwords).</p> <p>Note</p> <p>You may now change your password on the Tursa machine itself using the passwd command or when you are prompted the first time you login. This change will not be reflected in the SAFE. If you forget your password, you should use the SAFE to request a new one-shot password.</p>"},{"location":"tursa-user-guide/connecting/#mfa-time-based-one-time-passcode-totp-from-13-feb-2024","title":"MFA Time-based one-time passcode (TOTP) (from 13 Feb 2024)","text":"<p>You will need to use both an SSH key and time-based one-time passcode to log into Tursa so you will also need to set up a method for generating a TOTP code before you can log into Tursa. </p>"},{"location":"tursa-user-guide/connecting/#first-login-from-a-new-account-password-required","title":"First login from a new account: password required","text":"<p>Important</p> <p>You will not use your password when logging on to Tursa after the first login for a new account.</p> <p>As an additional security measure, you will also need to use a password from SAFE for your first login to Tursa with a new account. When you log into Tursa for the first time with a new account, you will be prompted to change your initial password. This is a three step process:</p> <ol> <li>When promoted to enter your ldap password: Enter the password  which you retrieve from SAFE</li> <li>When prompted to enter your new password: type in a new password</li> <li>When prompted to re-enter the new password: re-enter the new password</li> </ol> <p>Your password has now been changed. You will no longer need this password to log into Tursa from this point forwards, you will use your SSH key and TOTP as described above.</p>"},{"location":"tursa-user-guide/connecting/#ssh-clients","title":"SSH Clients","text":"<p>Interaction with Tursa is done remotely, over an encrypted communication channel, Secure Shell version 2 (SSH-2). This allows command-line access to one of the login nodes of a Tursa, from which you can run commands or use a command-line text editor to edit files. SSH can also be used to run graphical programs such as GUI text editors and debuggers when used in conjunction with an X client.</p>"},{"location":"tursa-user-guide/connecting/#logging-in","title":"Logging in","text":"<p>You can use the following command from the terminal window to login into Tursa:</p> <pre><code>ssh username@tursa.dirac.ed.ac.uk\n</code></pre> <ul> <li>Before 13 Feb 2024: You will first be prompted for your machine account password. Once you have entered your password successfully, you will then be prompted for the passphrase associated with your SSH key pair.</li> <li>After 13 Feb 2024: You will first be prompted for the passphrase associated with your SSH key pair (if it is not already added to a local SSH Agent) and then for your TOTP. </li> </ul> <p>You need to enter both credentials correctly to be able to access Tursa.</p> <p>Tip</p> <p>If your SSH key pair is not stored in the default location (usually <code>~/.ssh/id_rsa</code>) on your local system, you may need to specify the path to the private part of the key wih the <code>-i</code> option to <code>ssh</code>. For example, if your key is in a file called <code>keys/id_rsa_Tursa</code> you would use the command <code>ssh -i keys/id_rsa_Tursa username@login.tursa.ac.uk</code> to log in.</p> <p>Tip</p> <p>When you first log into Tursa, you will be prompted to change your initial password. This is a three step process:</p> <ol> <li>When promoted to enter your ldap password: Re-enter the password     you retrieved from SAFE</li> <li>When prompted to enter your new password: type in a new password</li> <li>When prompted to re-enter the new password: re-enter the new     password</li> </ol> <p>Your password has now been changed</p> <p>To allow remote programs, especially graphical applications to control your local display, such as being able to open up a new GUI window (such as for a debugger), use:</p> <pre><code>ssh -X username@tursa.dirac.ed.ac.uk\n</code></pre> <p>Some sites recommend using the <code>-Y</code> flag. While this can fix some compatibility issues, the <code>-X</code> flag is more secure.</p> <p>Current MacOS systems do not have an X window system. Users should install the XQuartz package to allow for SSH with X11 forwarding on MacOS systems:</p> <ul> <li>XQuartz website</li> </ul>"},{"location":"tursa-user-guide/connecting/#making-access-more-convenient-using-the-ssh-configuration-file","title":"Making access more convenient using the SSH configuration file","text":"<p>Typing in the full command to login or transfer data to Tursa can become tedious as it often has to be repeated many times. You can use the SSH configuration file, usually located on your local machine at <code>.ssh/config</code> to make things a bit more convenient.</p> <p>Each remote site (or group of sites) can have an entry in this file which may look something like:</p> <pre><code>Host tursa\n  HostName tursa.dirac.ed.ac.uk\n  User username\n</code></pre> <p>(remember to replace <code>username</code> with your actual username!).</p> <p>The <code>Host tursa</code> line defines a short name for the entry. In this case, instead of typing <code>ssh username@tursa.dirac.ed.ac.uk</code> to access the Tursa login nodes, you could use <code>ssh tursa</code> instead. The remaining lines define the options for the <code>tursa</code> host.</p> <ul> <li><code>Hostname tursa.dirac.ed.ac.uk</code> - defines the full address of the      host</li> <li><code>User username</code> - defines the username to use by default for this      host (replace <code>username</code> with your own username on the remote      host)</li> </ul> <p>Now you can use SSH to access Tursa without needing to enter your username or the full hostname every time:</p> <pre><code>$ ssh tursa\n</code></pre> <p>You can set up as many of these entries as you need in your local configuration file. Other options are available. See the ssh_config man page (or <code>man ssh_config</code> on any machine with SSH installed) for a description of the SSH configuration file. You may find the <code>IdentityFile</code> option useful if you have to manage multiple SSH key pairs for different systems as this allows you to specify which SSH key to use for each system.</p> <p>Bug</p> <p>There is a known bug with Windows ssh-agent. If you get the error message: <code>Warning: agent returned different signature type ssh-rsa (expected rsa-sha2-512)</code>, you will need to either specify the path to your ssh key in the command line (using the <code>-i</code> option as described above) or add the path to your SSH config file by using the <code>IdentityFile</code> option.</p>"},{"location":"tursa-user-guide/connecting/#ssh-debugging-tips","title":"SSH debugging tips","text":"<p>If you find you are unable to connect via SSH there are a number of ways you can try and diagnose the issue. Some of these are collected below - if you are having difficulties connecting we suggest trying these before contacting the Tursa service desk.</p>"},{"location":"tursa-user-guide/connecting/#use-the-usertursadiracedacuk-syntax-rather-than-l-user-tursadiracedacuk","title":"Use the <code>user@tursa.dirac.ed.ac.uk</code> syntax rather than <code>-l user tursa.dirac.ed.ac.uk</code>","text":"<p>We have seen a number of instances where people using the syntax</p> <pre><code>ssh -l user tursa.dirac.ed.ac.uk\n</code></pre> <p>have not been able to connect properly and get prompted for a password many times. We have found that using the alternative syntax:</p> <pre><code>ssh user@tursa.dirac.ed.ac.uk\n</code></pre> <p>works more reliably. If you are using the <code>-l user</code> option to connect and  are seeing issues, then try using <code>user@tursa.dirac.ed.ac.uk</code> instead.</p>"},{"location":"tursa-user-guide/connecting/#can-you-connect-to-the-login-node","title":"Can you connect to the login node?","text":"<p>Try the command <code>ping -c 3 tursa.dirac.ed.ac.uk</code>. If you successfully connect to the login node, the output should include:</p> <pre><code>--- tursa.dirac.ed.ac.uk ping statistics ---\n3 packets transmitted, 3 received, 0% packet loss, time 38ms\n</code></pre> <p>(the ping time '38ms' is not important). If not all packets are received there could be a problem with your internet connection, or the login node could be unavailable.</p>"},{"location":"tursa-user-guide/connecting/#password","title":"Password","text":"<p>If you are having trouble entering your password consider using a password manager, from which you can copy and paste it. This will also help you generate a secure password. If you need to reset your password, instructions for doing so can be found in the SAFE documentation</p> <p>Windows users please note that <code>Ctrl+V</code> does not work to paste in to PuTTY, MobaXterm, or PowerShell. Instead use <code>Shift+Ins</code> to paste. Alternatively, right-click and select 'Paste' in PuTTY and MobaXterm, or simply right-click to paste in PowerShell.</p>"},{"location":"tursa-user-guide/connecting/#ssh-key","title":"SSH key","text":"<p>If you get the error message <code>Permission denied (publickey)</code> this can indicate a problem with your SSH key. Some things to check:</p> <ul> <li> <p>Have you uploaded the key to SAFE? Please note that if the same      key is reuploaded SAFE will not map the \"new\" key to Tursa. If      for some reason this is required, please delete the key first,      then reupload.</p> </li> <li> <p>Is ssh using the correct key? You can check which keys are being      found and offered by ssh using <code>ssh -vvv</code>. If your private key has      a non-default name you can use the <code>-i</code> flag to provide it to ssh,      i.e. <code>ssh -i path/to/key username@tursa.dirac.ed.ac.uk</code>.</p> </li> <li> <p>Are you entering the passphrase correctly? You will be asked for      your private key's passphrase first. If you enter it incorrectly      you will usually be asked to enter it again, and usually up to      three times in total, after which ssh will fail with <code>Permission      denied (publickey)</code>. If you would like to confirm your passphrase      without attempting to connect, you can use <code>ssh-keygen -y -f      /path/to/private/key</code>. If successful, this command will print the      corresponding public key. You can also use this to check it is the      one uploaded to SAFE.</p> </li> <li> <p>Are permissions correct on the ssh key? One common issue is that      the permissions are incorrect on the either the key file, or the      directory it's contained in. On Linux/MacOS for example, if your      private keys are held in <code>~/.ssh/</code> you can check this with <code>ls -al      ~/.ssh</code>. This should give something similar to the following      output:</p> <pre><code> $ ls -al ~/.ssh/\n drwx------.  2 user group    48 Jul 15 20:24 .\n drwx------. 12 user group  4096 Oct 13 12:11 ..\n -rw-------.  1 user group   113 Jul 15 20:23 authorized_keys\n -rw-------.  1 user group 12686 Jul 15 20:23 id_rsa\n -rw-r--r--.  1 user group  2785 Jul 15 20:23 id_rsa.pub\n -rw-r--r--.  1 user group  1967 Oct 13 14:11 known_hosts\n</code></pre> <p>The important section here is the string of letters and dashes at  the start, for the lines ending in <code>.</code>, <code>id_rsa</code>, and  <code>id_rsa.pub</code>, which indicate permissions on the containing  directory, private key, and public key respectively. If your  permissions are not correct, they can be set with <code>chmod</code>. Consult  the table below for the relevant <code>chmod</code> command. On Windows,  permissions are handled differently but can be set by  right-clicking on the file and selecting Properties &gt; Security &gt;  Advanced. The user, SYSTEM, and Administrators should have <code>Full  control</code>, and no other permissions should exist for both public  and private key files, and the containing folder.</p> </li> </ul> Target Permissions <code>chmod</code> Code Directory <code>drwx------</code> 700 Private Key <code>-rw-------</code> 600 Public Key <code>-rw-r--r--</code> 644 <p><code>chmod</code> can be used to set permissions on the target in the following way: <code>chmod &lt;code&gt; &lt;target&gt;</code>. So for example to set correct permissions on the private key file <code>id_rsa_Tursa</code> one would use the command <code>chmod 600 id_rsa_Tursa</code>.</p> <p>Tip</p> <p>Unix file permissions can be understood in the following way. There are three groups that can have file permissions: (owning) users, (owning) groups, and others. The available permissions are read, write, and execute. The first character indicates whether the target is a file <code>-</code>, or directory <code>d</code>. The next three characters indicate the owning user's permissions. The first character is <code>r</code> if they have read permission, <code>-</code> if they don't, the second character is <code>w</code> if they have write permission, <code>-</code> if they don't, the third character is <code>x</code> if they have execute permission, <code>-</code> if they don't. This pattern is then repeated for group, and other permissions. For example the pattern <code>-rw-r--r--</code> indicates that the owning user can read and write the file, members of the owning group can read it, and anyone else can also read it. The <code>chmod</code> codes are constructed by treating the user, group, and owner permission strings as binary numbers, then converting them to decimal. For example the permission string <code>-rwx------</code> becomes <code>111 000 000</code> -&gt; <code>700</code>.</p>"},{"location":"tursa-user-guide/connecting/#ssh-verbose-output","title":"SSH verbose output","text":"<p>Verbose debugging output from <code>ssh</code> can be very useful for diagnosing the issue. In particular, it can be used to distinguish between problems with the SSH key and password - further details are given below. To enable verbose output add the <code>-vvv</code> flag to your SSH command. For example:</p> <pre><code>ssh -vvv username@tursa.dirac.ed.ac.uk\n</code></pre> <p>The output is lengthy, but somewhere in there you should see lines similar to the following:</p> <pre><code>debug1: Next authentication method: keyboard-interactive\ndebug2: userauth_kbdint\ndebug3: send packet: type 50\ndebug2: we sent a keyboard-interactive packet, wait for reply\ndebug3: receive packet: type 60\ndebug2: input_userauth_info_req\ndebug2: input_userauth_info_req: num_prompts 1\nPassword:\ndebug3: send packet: type 61\ndebug3: receive packet: type 60\ndebug2: input_userauth_info_req\ndebug2: input_userauth_info_req: num_prompts 0\ndebug3: send packet: type 61\ndebug3: receive packet: type 51\nAuthenticated with partial success.\ndebug1: Authentications that can continue: publickey,password\n</code></pre> <p>If you do not see the <code>Password:</code> prompt you may have connection issues, or there could be a problem with the Tursa login nodes. If you do not see <code>Authenticated with partial success</code> it means your password was not accepted. You will be asked to re-enter your password, usually two more times before the connection will be rejected. Consider the suggestions under Password above. If you do see <code>Authenticated with partial success</code>, it means your password was accepted, and your SSH key will now be checked.</p> <p>You should next see something similiar to:</p> <pre><code>debug1: Next authentication method: publickey\ndebug1: Offering public key: RSA SHA256:&lt;key_hash&gt; &lt;path_to_private_key&gt;\ndebug3: send_pubkey_test\ndebug3: send packet: type 50\ndebug2: we sent a publickey packet, wait for reply\ndebug3: receive packet: type 60\ndebug1: Server accepts key: pkalg rsa-sha2-512 blen 2071\ndebug2: input_userauth_pk_ok: fp SHA256:&lt;key_hash&gt;\ndebug3: sign_and_send_pubkey: RSA SHA256:&lt;key_hash&gt;\nEnter passphrase for key '&lt;path_to_private_key&gt;':\ndebug3: send packet: type 50\ndebug3: receive packet: type 52\ndebug1: Authentication succeeded (publickey).\n</code></pre> <p>Most importantly, you can see which files ssh has checked for private keys, and you can see if any key is accepted. The line <code>Authenticated succeeded</code> indicates that the SSH key has been accepted. By default ssh will go through a list of standard private key files, as well as any you have specified with <code>-i</code> or a config file. This is fine, as long as one of the files mentioned is the one that matches the public key uploaded to SAFE.</p> <p>If your SSH key passphrase is incorrect, you will be asked to try again up to three times in total, before being disconnected with <code>Permission denied (publickey)</code>. If you enter your passphrase correctly, but still see this error message, please consider the advice under SSH key above.</p>"},{"location":"tursa-user-guide/data/","title":"Data management and transfer","text":"<p>This section covers best practice and tools for data management on Tursa as well as information on the storgae available on the system.</p> <p>Information</p> <p>If you have any questions on data management and transfer please do not hesitate to contact the DiRAC service desk at dirac-support@epcc.ed.ac.uk.</p>"},{"location":"tursa-user-guide/data/#useful-resources-and-links","title":"Useful resources and links","text":"<ul> <li>Harry Mangalam's guide on How to transfer large amounts of data      via      network.      This provides lots of useful advice on transferring data.</li> </ul>"},{"location":"tursa-user-guide/data/#data-management","title":"Data management","text":"<p>We strongly recommend that you give some thought to how you use the various data storage facilities that are part of the Tursa service. This will not only allow you to use the machine more effectively but also to ensure that your valuable data is protected.</p>"},{"location":"tursa-user-guide/data/#tursa-storage","title":"Tursa storage","text":"<p>Tursa has two different storage systems available:</p> <ul> <li>Parallel Lustre file system - for working, high-performance storage</li> <li>Tape storage - for storing large amounts of data that are not currently required for jobs on the system</li> </ul>"},{"location":"tursa-user-guide/data/#parallel-lustre-file-system","title":"Parallel Lustre file system","text":"<p>The Tursa storage is provided by a parallel Lustre file system that  provides your home directories and working storage. When you log in you will be placed in your home directory. </p> <p>The home directory for each user is located at:</p> <pre><code>/home/[project code]/[group code]/[username]\n</code></pre> <p>where</p> <ul> <li><code>[project code]</code> is the code for your project (e.g., x01);</li> <li><code>[group code]</code> is the code for your project group, if your project has      groups, (e.g. x01-a) or the same as the project code, if not;</li> <li><code>[username]</code> is your login name.</li> </ul> <p>Each project is allocated a portion of the total storage available, and the project PI will be able to sub-divide this quota among the groups and users within the project. As is standard practice on UNIX and Linux systems, the environment variable <code>$HOME</code> is automatically set to point to your home directory.</p>"},{"location":"tursa-user-guide/data/#tape-storage","title":"Tape storage","text":"<p>The tape storage can be made available to any Tursa user on request and  can be used to store data from the Lustre parallel file system.</p> <p>Managing and transferring data to/from the Tursa tape storage is done via the Miria web interface via an SSH tunnel to the Tursa login nodes.</p> <p>Important</p> <p>All data on the tape storage is shared project data rather than data associated with individual user accounts. Any data you move to tape will be visible to all users in the same project as you who have access to the tape storage.</p>"},{"location":"tursa-user-guide/data/#requesting-access-to-the-tape-storage","title":"Requesting access to the tape storage","text":"<p>If you want to use the Tursa tape storage, you should contact the DiRAC Service Desk with the username and project ID you want to use to access the storage.</p>"},{"location":"tursa-user-guide/data/#data-locations","title":"Data locations","text":"<p>In order to move data to the tape storage it must exist in a specific directory on the Tursa Lustre file system. You will need to move or copy the data to this location before it can be moved to tape and when you  restore data from tape it will be placed in this location.</p> <p>There is one directory per project on Tursa. The directory has the path:</p> <pre><code>/mnt/lustre/tursafs1/archive/[project code]\n</code></pre> <p>So, for example, the directory for project <code>dp001</code> would be:</p> <pre><code>/mnt/lustre/tursafs1/archive/dp001\n</code></pre>"},{"location":"tursa-user-guide/data/#setup-the-ssh-tunnel-for-miria","title":"Setup the SSH tunnel for Miria","text":"<p>Once your tape storage access has been setup and you have moved data to the archive directory, you will need to connect to the Miria web interface  in a web browser on your local system by setting up an SSH tunnel to the Tursa login nodes.</p> <p>You do this by logging into Tursa in the usual way (with your SSH key and password) and adding the <code>-L 9080:10.144.20.95:443</code> option to the <code>ssh</code> command.</p> <p>For example, if your username is <code>dc-user1</code>, you would setup the tunnel by logging into Tursa with (assuming your SSH key is in the default location):</p> <pre><code>ssh -L 9080:10.144.20.95:443 dc-user1@tursa.dirac.ed.ac.uk\n</code></pre> <p>Enter your SSH key passphrase and password in the usual way.</p> <p>Note</p> <p>You will need to setup the SSH tunnel each time you want to access the  Miria interface.</p>"},{"location":"tursa-user-guide/data/#access-the-miria-interface","title":"Access the Miria interface","text":"<p>Once you have setup the SSH tunnel, you should be able to access the Miria interface in a web browser on your local system. Open a new tab and enter the URL:</p> <ul> <li>http://localhost:9080/webapp-en/login</li> </ul> <p>You should see an interface asking you for a username and password. Use the  username and password that you use to log into Tursa to log into the tape  storage interface.</p>"},{"location":"tursa-user-guide/data/#transfer-data-from-tursa-lustre-to-tape","title":"Transfer data from Tursa Lustre to tape","text":"<p>You use the \"Easy Move\" option from the left-hand menu to transfer data.</p> <ol> <li>Click on \"Easy Move\"</li> <li>Click on the \"Find a source\" menu and select the disk with your project ID (e.g. \"dp001\")</li> <li>Click on the \"Find a target\" menu and select the archive with your project ID (e.g. \"dp001\")</li> <li>Use the file explorer to select the files/directories you wish to move to tape</li> <li>Click the \"Add\" button</li> <li>Scroll to the bottom of the page and select \"Validate basket\" and confirm you wish to proceed</li> </ol> <p>Your transfer request will be added to the queue. You can check on progress by selecting the \"Activity\" option in the left hand menu.</p>"},{"location":"tursa-user-guide/data/#restore-data-from-tape-to-tursa-lustre","title":"Restore data from tape to Tursa Lustre","text":"<p>You use the \"Easy Move\" option from the left-hand menu to transfer data.</p> <ol> <li>Click on \"Easy Move\"</li> <li>Select the \"Repository\" icon next to the \"Find a source\" menu</li> <li>Click on the \"Find a source\" menu and select the archive with your project ID (e.g. \"dp001\")</li> <li>Select the \"Platform\" icon next to the \"Find a source\" menu</li> <li>Click on the \"Find a target\" menu and select the disk with your project ID (e.g. \"dp001\")</li> <li>Use the source file explorer to select the files/directories you wish to restore </li> <li>Use the target file explorer to select the location oon disk to restore the data</li> <li>Click the \"Add\" button</li> <li>Scroll to the bottom of the page and select \"Validate basket\" and confirm you wish to proceed</li> </ol> <p>Your transfer request will be added to the queue. You can check on progress by selecting the \"Activity\" option in the left hand menu.</p> <p>Bug</p> <p>If you restore a file rather than a directory, the Miria tool will give the file the name <code>NULL</code> once it is restored, you should use the <code>mv</code> command to rename the file to the correct name once it has been restored.</p>"},{"location":"tursa-user-guide/data/#sharing-data-with-other-tursa-users","title":"Sharing data with other Tursa users","text":"<p>How you share data with other Tursa users depends on whether or not they belong to the same project as you. Each project has two shared folders that can be used for sharing data.</p>"},{"location":"tursa-user-guide/data/#sharing-data-with-tursa-users-in-your-project","title":"Sharing data with Tursa users in your project","text":"<p>Each project has an inner shared folder.</p> <pre><code>/home/[project code]/[project code]/shared\n</code></pre> <p>This folder has read/write permissions for all project members. You can place any data you wish to share with other project members in this directory. For example, if your project code is x01 the inner shared folder would be located at <code>/home/x01/x01/shared</code>.</p>"},{"location":"tursa-user-guide/data/#sharing-data-with-all-tursa-users","title":"Sharing data with all Tursa users","text":"<p>Each project also has an outer shared folder.:</p> <pre><code>/home/[project code]/shared\n</code></pre> <p>It is writable by all project members and readable by any user on the system. You can place any data you wish to share with other Tursa users who are not members of your project in this directory. For example, if your project code is x01 the outer shared folder would be located at <code>/home/x01/shared</code>.</p>"},{"location":"tursa-user-guide/data/#permissions","title":"Permissions","text":"<p>You should check the permissions of any files that you place in the shared area, especially if those files were created in your own Tursa account Files of the latter type are likely to be readable by you only.</p> <p>The <code>chmod</code> command below shows how to make sure that a file placed in the outer shared folder is also readable by all Tursa users.</p> <pre><code>chmod a+r /home/x01/shared/your-shared-file.txt\n</code></pre> <p>Similarly, for the inner shared folder, <code>chmod</code> can be called such that read permission is granted to all users within the x01 project.</p> <pre><code>chmod g+r /home/x01/x01/shared/your-shared-file.txt\n</code></pre> <p>If you're sharing a set of files stored within a folder hierarchy the <code>chmod</code> is slightly more complicated.</p> <pre><code>chmod -R a+Xr /home/x01/shared/my-shared-folder\nchmod -R g+Xr /home/x01/x01/shared/my-shared-folder\n</code></pre> <p>The <code>-R</code> option ensures that the read permission is enabled recursively and the <code>+X</code> guarantees that the user(s) you're sharing the folder with can  access the subdirectories below <code>my-shared-folder</code>.</p>"},{"location":"tursa-user-guide/data/#archiving-and-data-transfer","title":"Archiving and data transfer","text":"<p>Data transfer speed may be limited by many different factors so the best data transfer mechanism to use depends on the type of data being transferred and where the data is going.</p> <ul> <li>Disk speed - The Tursa  file system is highly parallel,      consisting of a very large number      of high performance disk drives. This allows it to support a      very high data bandwidth. Unless the remote system has a similar      parallel file-system you may find your transfer speed limited by      disk performance.</li> <li>Meta-data performance - Meta-data operations such as opening      and closing files or listing the owner or size of a file are much      less parallel than read/write operations. If your data consists of      a very large number of small files you may find your transfer      speed is limited by meta-data operations. Meta-data operations      performed by other users of the system will interact strongly with      those you perform so reducing the number of such operations you      use, may reduce variability in your IO timings.</li> <li>Network speed - Data transfer performance can be limited by      network speed. More importantly it is limited by the slowest      section of the network between source and destination.</li> <li>Firewall speed - Most modern networks are protected by some      form of firewall that filters out malicious traffic. This      filtering has some overhead and can result in a reduction in data      transfer performance. The needs of a general purpose network that      hosts email/web-servers and desktop machines are quite different      from a research network that needs to support high volume data      transfers. If you are trying to transfer data to or from a host on      a general purpose network you may find the firewall for that      network will limit the transfer rate you can achieve.</li> </ul> <p>The method you use to transfer data to/from Tursa will depend on how much you want to transfer and where to. The methods we cover in this guide are:</p> <ul> <li>scp/sftp/rsync - These are the simplest methods of      transferring data and can be used up to moderate amounts of data.      If you are transferring data to your workstation/laptop then this      is the method you will use.</li> </ul> <p>Before discussing specific data transfer methods, we cover archiving which is an essential process for transferring data efficiently.</p>"},{"location":"tursa-user-guide/data/#archiving","title":"Archiving","text":"<p>If you have related data that consists of a large number of small files it is strongly recommended to pack the files into a larger \"archive\" file for ease of transfer and manipulation. A single large file makes more efficient use of the file system and is easier to move and copy and transfer because significantly fewer meta-data operations are required. Archive files can be created using tools like <code>tar</code> and <code>zip</code>.</p>"},{"location":"tursa-user-guide/data/#tar","title":"tar","text":"<p>The <code>tar</code> command packs files into a \"tape archive\" format. The command has general form:</p> <pre><code>tar [options] [file(s)]\n</code></pre> <p>Common options include:</p> <ul> <li><code>-c</code> create a new archive</li> <li><code>-v</code> verbosely list files processed</li> <li><code>-W</code> verify the archive after writing</li> <li><code>-l</code> confirm all file hard links are included in the archive</li> <li><code>-f</code> use an archive file (for historical reasons, tar writes its      output to stdout by default rather than a file).</li> </ul> <p>Putting these together:</p> <pre><code>tar -cvWlf mydata.tar mydata\n</code></pre> <p>will create and verify an archive.</p> <p>To extract files from a tar file, the option <code>-x</code> is used. For example:</p> <pre><code>tar -xf mydata.tar\n</code></pre> <p>will recover the contents of <code>mydata.tar</code> to the current working directory.</p> <p>To verify an existing tar file against a set of data, the <code>-d</code> (diff) option can be used. By default, no output will be given if a verification succeeds and an example of a failed verification follows:</p> <pre><code>$&gt; tar -df mydata.tar mydata/*\nmydata/damaged_file: Mod time differs\nmydata/damaged_file: Size differs\n</code></pre> <p>Note</p> <p>tar files do not store checksums with their data, requiring the original data to be present during verification.</p> <p>Tip</p> <p>Further information on using <code>tar</code> can be found in the <code>tar</code> manual (accessed via <code>man tar</code> or at man tar).</p>"},{"location":"tursa-user-guide/data/#zip","title":"zip","text":"<p>The zip file format is widely used for archiving files and is supported by most major operating systems. The utility to create zip files can be run from the command line as:</p> <pre><code>zip [options] mydata.zip [file(s)]\n</code></pre> <p>Common options are:</p> <ul> <li><code>-r</code> used to zip up a directory</li> <li><code>-#</code> where \"#\" represents a digit ranging from 0 to 9 to specify      compression level, 0 being the least and 9 the most. Default      compression is -6 but we recommend using -0 to speed up the      archiving process.</li> </ul> <p>Together:</p> <pre><code>zip -0r mydata.zip mydata\n</code></pre> <p>will create an archive.</p> <p>Note</p> <p>Unlike tar, zip files do not preserve hard links. File data will be copied on archive creation, e.g. an uncompressed zip archive of a 100MB file and a hard link to that file will be approximately 200MB in size. This makes zip an unsuitable format if you wish to precisely reproduce the file system layout.</p> <p>The corresponding <code>unzip</code> command is used to extract data from the archive. The simplest use case is:</p> <pre><code>unzip mydata.zip\n</code></pre> <p>which recovers the contents of the archive to the current working directory.</p> <p>Files in a zip archive are stored with a CRC checksum to help detect data loss. <code>unzip</code> provides options for verifying this checksum against the stored files. The relevant flag is <code>-t</code> and is used as follows:</p> <pre><code>$&gt; unzip -t mydata.zip\nArchive:  mydata.zip\n    testing: mydata/                 OK\n    testing: mydata/file             OK\nNo errors detected in compressed data of mydata.zip.\n</code></pre> <p>Tip</p> <p>Further information on using <code>zip</code> can be found in the <code>zip</code> manual (accessed via <code>man zip</code> or at man zip).</p>"},{"location":"tursa-user-guide/data/#data-transfer-via-ssh","title":"Data transfer via SSH","text":"<p>The easiest way of transferring data to/from Tursa is to use one of the standard programs based on the SSH protocol such as <code>scp</code>, <code>sftp</code> or <code>rsync</code>. These all use the same underlying mechanism (SSH) as you normally use to log-in to Tursa. So, once the the command has been executed via the command line, you will be prompted for your password for the specified account on the remote machine (Tursa in this case).</p> <p>To avoid having to type in your password multiple times you can set up a SSH key pair and use an SSH agent as documented in the User Guide at <code>connecting</code>.</p>"},{"location":"tursa-user-guide/data/#ssh-data-transfer-performance-considerations","title":"SSH data transfer performance considerations","text":"<p>The SSH protocol encrypts all traffic it sends. This means that file transfer using SSH consumes a relatively large amount of CPU time at both ends of the transfer (for encryption and decryption). The Tursa login nodes have fairly fast processors that can sustain about 100 MB/s transfer. The encryption algorithm used is negotiated between the SSH client and the SSH server. There are command line flags that allow you to specify a preference for which encryption algorithm should be used. You may be able to improve transfer speeds by requesting a different algorithm than the default. The <code>aes128-ctr</code> or <code>aes256-ctr</code> algorithms are well supported and fast as they are implemented in hardware. These are not usually the default choice when using <code>scp</code> so you will need to manually specify them.</p> <p>A single SSH based transfer will usually not be able to saturate the available network bandwidth or the available disk bandwidth so you may see an overall improvement by running several data transfer operations in parallel. To reduce metadata interactions it is a good idea to overlap transfers of files from different directories.</p> <p>In addition, you should consider the following when transferring data:</p> <ul> <li>Only transfer those files that are required. Consider which data      you really need to keep.</li> <li>Combine lots of small files into a single tar archive, to reduce      the overheads associated in initiating many separate data      transfers (over SSH, each file counts as an individual transfer).</li> <li>Compress data before transferring it, e.g. using <code>gzip</code>.</li> </ul>"},{"location":"tursa-user-guide/data/#scp","title":"scp","text":"<p>The <code>scp</code> command creates a copy of a file, or if given the <code>-r</code> flag, a directory either from a local machine onto a remote machine or from a remote machine onto a local machine.</p> <p>For example, to transfer files to Tursa from a local machine:</p> <pre><code>scp [options] source user@tursa.dirac.ed.ac.uk:[destination]\n</code></pre> <p>(Remember to replace <code>user</code> with your Tursa username in the example above.)</p> <p>In the above example, the <code>[destination]</code> is optional, as when left out <code>scp</code> will copy the source into your home directory. Also, the <code>source</code> should be the absolute path of the file/directory being copied or the command should be executed in the directory containing the source file/directory.</p> <p>If you want to request a different encryption algorithm add the <code>-c [algorithm-name]</code> flag to the <code>scp</code> options. For example, to use the (usually faster) arcfour encryption algorithm you would     use:</p> <pre><code>scp [options] -c aes128-ctr source user@tursa.dirac.ed.ac.uk:[destination]\n</code></pre> <p>(Remember to replace <code>user</code> with your Tursa username in the example above.)</p>"},{"location":"tursa-user-guide/data/#rsync","title":"rsync","text":"<p>The <code>rsync</code> command can also transfer data between hosts using a <code>ssh</code> connection. It creates a copy of a file or, if given the <code>-r</code> flag, a directory at the given destination, similar to <code>scp</code> above.</p> <p>Given the <code>-a</code> option rsync can also make exact copies (including permissions), this is referred to as mirroring. In this case the <code>rsync</code> command is executed with <code>ssh</code> to create the copy on a remote machine.</p> <p>To transfer files to Tursa using <code>rsync</code> with <code>ssh</code> the command has the form:</p> <pre><code>rsync [options] -e ssh source user@tursa.dirac.ed.ac.uk:[destination]\n</code></pre> <p>(Remember to replace <code>user</code> with your Tursa username in the example above.)</p> <p>In the above example, the <code>[destination]</code> is optional, as when left out rsync will copy the source into your home directory. Also the <code>source</code> should be the absolute path of the file/directory being copied or the command should be executed in the directory containing the source file/directory.</p> <p>Additional flags can be specified for the underlying <code>ssh</code> command by using a quoted string as the argument of the <code>-e</code> flag.     e.g.</p> <pre><code>rsync [options] -e \"ssh -c arcfour\" source user@tursa.dirac.ed.ac.uk:[destination]\n</code></pre> <p>(Remember to replace <code>user</code> with your Tursa username in the example above.)</p> <p>Tip</p> <p>Further information on using <code>rsync</code> can be found in the <code>rsync</code> manual (accessed via <code>man rsync</code> or at man rsync).</p>"},{"location":"tursa-user-guide/data/#ssh-data-transfer-example-laptopworkstation-to-tursa","title":"SSH data transfer example: laptop/workstation to Tursa","text":"<p>Here we have a short example demonstrating transfer of data directly from a laptop/workstation to Tursa.</p> <p>Note</p> <p>This guide assumes you are using a command line interface to transfer data. This means the terminal on Linux or macOS, MobaXterm local terminal on Windows or Powershell.</p> <p>Before we can transfer of data to Tursa we need to make sure we have an SSH key setup to access Tursa from the system we are transferring data from. If you are using the same system that you use to log into Tursa then you should be all set. If you want to use a different system you will need to generate a new SSH key there (or use SSH key forwarding) to allow you to  connect to Tursa.</p> <p>Tip</p> <p>Remember that you will need to use both a key and your password to transfer data to Tursa.</p> <p>Once we know our keys are setup correctly, we are now ready to transfer  data directly between the two machines. We begin by combining our important  research data in to a single archive file using the following command:</p> <pre><code>tar -czf all_my_files.tar.gz file1.txt file2.txt file3.txt\n</code></pre> <p>We then initiate the data transfer from our system to Tursa, here using <code>rsync</code> to allow the transfer to be recommenced without needing to start again, in the event of a loss of connection or other failure. For example,  using the SSH key in the file <code>~/.ssh/id_RSA_A2</code> on our local system:</p> <pre><code>rsync -Pv -e\"ssh -c aes128-gcm@openssh.com -i $HOME/.ssh/id_RSA_A2\" ./all_my_files.tar.gz otbz19@tursa.dirac.ed.ac.uk:/home/z19/z19/otbz19/\n</code></pre> <p>Note the use of the <code>-P</code> flag to allow partial transfer -- the same command could be used to restart the transfer after a loss of connection. The <code>-e</code> flag allows specification of the ssh command - we have used this to add the location of the identity file.  The <code>-c</code> option specifies the cipher to be used as <code>aes128-gcm</code> which has  been found to increase performance. Unfortunately the <code>~</code> shortcut is not correctly expanded, so we have specified the full path. We move our  research archive to our project work directory on Tursa.</p> <p>Note</p> <p>Remember to replace <code>otbz19</code> with your username on Tursa.</p> <p>If we were unconcerned about being able to restart an interrupted transfer, we could instead use the <code>scp</code> command,</p> <pre><code>scp -c aes128-gcm@openssh.com -i ~/.ssh/id_RSA_A2 all_my_files.tar.gz otbz19@transfer.dyn.tursa.ac.uk:/home/z19/z19/otbz19/\n</code></pre> <p>but <code>rsync</code> is recommended for larger transfers.</p>"},{"location":"tursa-user-guide/hardware/","title":"ARCHER2 hardware","text":"<p>Note</p> <p>Some of the material in this section is closely based on information provided by NASA as part of the documentation for the Aitkin HPC system.</p>"},{"location":"tursa-user-guide/hardware/#system-overview","title":"System overview","text":"<p>Tursa is a Eviden supercomputing system which has a total of 178 GPU compute nodes. Each GPU compute node has a CPU with 48 cores and 4 NVIDIA A100 GPU. Compute nodes are connected together by an Infiniband interconnect. </p> <p>There are additional login nodes, which provide access to the system.</p> <p>Compute nodes are only accessible via the Slurm job scheduling system.</p> <p>There is a single file system which is available on login and compute nodes (see Data management and transfer).</p> <p>The Lustre file system has a capacity of 5.1 PiB.</p> <p>The interconnect uses a Fat Tree topology.</p>"},{"location":"tursa-user-guide/hardware/#interconnect-details","title":"Interconnect details","text":"<p>Tursa has a high performance interconnect with 4x 200 Gb/s infiniband interfaces per node. It uses a 2-layer fat tree topology:</p> <ul> <li>Each node connects to 4 of the 5 L1 (leaf) switches within the same cabinet with 200 Gb/s links</li> <li>Within an 8-node block, all nodes share the same 4 switches</li> <li>Each L1 switch connects to all 20 L2 switches via 200 Gb/s links - leading maximum of 2 switch to switch hops to get between any 2 nodes</li> <li>There are no direct L1 to L1 or L2 to L2 switch connections</li> <li>16-node, 32-node and 64-node blocks are constructed from 8-node blocks that show the required performance on the inter-block links</li> </ul>"},{"location":"tursa-user-guide/scheduler/","title":"Running jobs on Tursa","text":"<p>As with most HPC services, Tursa uses a scheduler to manage access to resources and ensure that the thousands of different users of system are able to share the system and all get access to the resources they require. Tursa uses the Slurm software to schedule jobs.</p> <p>Writing a submission script is typically the most convenient way to submit your job to the scheduler. Example submission scripts (with explanations) for the most common job types are provided below.</p> <p>Interactive jobs are also available and can be particularly useful for developing and debugging applications. More details are available below.</p> <p>Hint</p> <p>If you have any questions on how to run jobs on Tursa do not hesitate to contact the DiRAC Service Desk.</p> <p>You typically interact with Slurm by issuing Slurm commands from the login nodes (to submit, check and cancel jobs), and by specifying Slurm directives that describe the resources required for your jobs in job submission scripts.</p>"},{"location":"tursa-user-guide/scheduler/#resources","title":"Resources","text":""},{"location":"tursa-user-guide/scheduler/#gpuh","title":"GPUh","text":"<p>Time used on Tursa nodes is measured in GPUh. 1 GPUh = 1 GPU for 1 hour. So a Tursa compute node with 4 GPUs would cost 4 GPUh per hour.</p> <p>Note</p> <p>The minimum resource request on Tursa is one full node which is charged  at a rate of 4 GPUh per hour.</p>"},{"location":"tursa-user-guide/scheduler/#checking-available-budget","title":"Checking available budget","text":"<p>You can check in SAFE by selecting <code>Login accounts</code> from the menu, select the login account you want to query.</p> <p>Under <code>Login account details</code> you will see each of the budget codes you have access to listed e.g. <code>dp123 resources</code> and then under Resource Pool to the right of this, a note of the remaining budgets. </p> <p>When logged in to the machine you can also use the command </p> <pre><code>sacctmgr show assoc where user=$LOGNAME format=account,user,maxtresmins%75\n</code></pre> <p>This will list all the budget codes that you have access to e.g.</p> <pre><code>Account       User                                                                 MaxTRESMins \n---------- ---------- --------------------------------------------------------------------------- \n       t01   dc-user1                           gres/cpu-low=0,gres/cpu-standard=0,gres/gpu-low=0 \n       z01   dc-user1   \n</code></pre> <p>This shows that <code>dc-user1</code> is a member of budgets <code>t01</code> and <code>z01</code>.  However, the <code>gres/cpu-low=0,gres/cpu-standard=0,gres/gpu-low=0</code> indicates that the <code>t01</code> budget can only run GPU jobs in standard (charged) partitions (all other options are disabled, indicated by <code>=0</code> for CPU standard, CPU low and GPU low).  This user can also submit jobs to any partition using the <code>z01</code> budget.</p> <p>To see the number of coreh or GPUh remaining you must check in SAFE.</p>"},{"location":"tursa-user-guide/scheduler/#charging","title":"Charging","text":"<p>Jobs run on Tursa are charged for the time they use i.e. from the time the job begins to run until the time the job ends (not the full wall time requested).</p> <p>Jobs are charged for the full number of nodes which are requested, even if they are not all used.</p> <p>Charging takes place at the time the job ends, and the job is charged in full to the budget which is live at the end time.</p>"},{"location":"tursa-user-guide/scheduler/#basic-slurm-commands","title":"Basic Slurm commands","text":"<p>There are three key commands used to interact with the Slurm on the command line:</p> <ul> <li><code>sinfo</code> - Get information on the partitions and resources available</li> <li><code>sbatch jobscript.slurm</code> - Submit a job submission script (in this     case called: <code>jobscript.slurm</code>) to the scheduler</li> <li><code>squeue</code> - Get the current status of jobs submitted to the scheduler</li> <li><code>scancel 12345</code> - Cancel a job (in this case with the job ID     <code>12345</code>)</li> </ul> <p>We cover each of these commands in more detail below.</p>"},{"location":"tursa-user-guide/scheduler/#sinfo-information-on-resources","title":"<code>sinfo</code>: information on resources","text":"<p><code>sinfo</code> is used to query information about available resources and partitions. Without any options, <code>sinfo</code> lists the status of all resources and partitions, e.g.</p> <pre><code>[dc-user1@tursa-login1 ~]$ sinfo \n\nPARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST\ncpu          up 2-00:00:00      4  alloc tu-c0r0n[66-69]\ncpu          up 2-00:00:00      2   idle tu-c0r0n[70-71]\ngpu          up 2-00:00:00      1   plnd tu-c0r2n93\ngpu          up 2-00:00:00     11  drain tu-c0r0n75,tu-c0r5n[48,51,54,57],tu-c0r6n[48,51,54,57],tu-c0r7n[00,48]\ngpu          up 2-00:00:00    112    mix tu-c0r0n[00,03,06,09,12,15,18,21,24,27,30,33,36,39,42,45,72,87,90],tu-c0r1n[00,03,06,09,12,15,18,21,24,27,30,33,60,63,66,69,72,75,78,81,84,87,90,93],tu-c0r2n[00,03,06,09,12,15,18,21,24,27,30,33,60,63,66,69,72,75,78,81,84,87,90],tu-c0r3n[00,03,06,09,12,15,18,21,24,27,30,33,60,63,66,69,72,75,78,81,84,90,93],tu-c0r4n[00,03,06,09,12,15,18,21,24,27,30,33,60,63,66,69,72,75,81,84,87,90,93]\ngpu          up 2-00:00:00     56   resv tu-c0r0n93,tu-c0r4n78,tu-c0r5n[00,03,06,09,12,15,18,21,24,27,30,33,36,39,42,45],tu-c0r6n[00,03,06,09,12,15,18,21,24,27,30,33,36,39,42,45,60,63,66,69],tu-c0r7n[03,06,09,12,15,18,21,24,27,30,33,36,39,42,45,51,54,57]\ngpu          up 2-00:00:00      1   idle tu-c0r3n87\n</code></pre> <ul> <li><code>alloc</code> nodes are those that are running jobs</li> <li><code>idle</code> nodes are empty</li> <li><code>drain</code>, <code>down</code>, <code>maint</code> nodes are unavailable to users</li> <li><code>plnd</code> nodes are reserved for future jobs</li> </ul>"},{"location":"tursa-user-guide/scheduler/#sbatch-submitting-jobs","title":"<code>sbatch</code>: submitting jobs","text":"<p><code>sbatch</code> is used to submit a job script to the job submission system. The script will typically contain one or more <code>mpirun</code> commands to launch parallel tasks.</p> <p>When you submit the job, the scheduler provides the job ID, which is used to identify this job in other Slurm commands and when looking at resource usage in SAFE.</p> <pre><code>sbatch test-job.slurm\nSubmitted batch job 12345\n</code></pre>"},{"location":"tursa-user-guide/scheduler/#squeue-monitoring-jobs","title":"<code>squeue</code>: monitoring jobs","text":"<p><code>squeue</code> without any options or arguments shows the current status of all jobs known to the scheduler. For example:</p> <pre><code>squeue\n</code></pre> <p>will list all jobs on Tursa.</p> <p>The output of this is often large. You can restrict the output to just your jobs by adding the <code>--me</code> option:</p> <pre><code>squeue --me\n</code></pre>"},{"location":"tursa-user-guide/scheduler/#scancel-deleting-jobs","title":"<code>scancel</code>: deleting jobs","text":"<p><code>scancel</code> is used to delete a jobs from the scheduler. If the job is waiting to run it is simply cancelled, if it is a running job then it is stopped immediately. You need to provide the job ID of the job you wish to cancel/stop. For example:</p> <pre><code>scancel 12345\n</code></pre> <p>will cancel (if waiting) or stop (if running) the job with ID <code>12345</code>.</p>"},{"location":"tursa-user-guide/scheduler/#resource-limits","title":"Resource Limits","text":"<p>The Tursa resource limits for any given job are covered by three separate attributes.</p> <ul> <li>The amount of primary resource you require, i.e., number of     compute nodes.</li> <li>The partition that you want to use - this specifies the nodes that     are eligible to run your job.</li> <li>The Quality of Service (QoS) that you want to use - this specifies     the job limits that apply.</li> </ul>"},{"location":"tursa-user-guide/scheduler/#primary-resource","title":"Primary resource","text":"<p>The primary resource you can request for your job is the compute node.</p> <p>Information</p> <p>The <code>--exclusive</code> option is enforced on Tursa which means you will always have access to all of the memory on the compute node regardless of how many processes are actually running on the node.</p> <p>Note</p> <p>You will not generally have access to the full amount of memory resource on the the node as some is retained for running the operating system and other system processes.</p>"},{"location":"tursa-user-guide/scheduler/#partitions","title":"Partitions","text":"<p>On Tursa, compute nodes are grouped into partitions. You will have to specify a partition using the <code>--partition</code> option in your Slurm submission script. The following table has a list of active partitions on Tursa.</p> Partition Description Max nodes available cpu CPU nodes with AMD EPYC 48-core processor \u00d7 2 6 gpu GPU nodes with AMD EPYC 48-core processor and NVIDIA A100 GPU \u00d7 4 (this includes both A100-40 and A100-80 GPU) 181 gpu-a100-40 GPU nodes with 2 AMD EPYC 16-core processors and NVIDIA A100-40 GPU \u00d7 4 114 gpu-a100-80 GPU nodes with 2 AMD EPYC 24-core processor (3 nodes have 2 AMD EPYC 16-core processors) and NVIDIA A100-80 GPU \u00d7 4 67 <p>You can list the active partitions by running <code>sinfo</code>.</p> <p>Tip</p> <p>You may not have access to all the available partitions.</p>"},{"location":"tursa-user-guide/scheduler/#quality-of-service-qos","title":"Quality of Service (QoS)","text":"<p>On Tursa, job limits are defined by the requested Quality of Service (QoS), as specified by the <code>--qos</code> Slurm directive. The following table lists the active QoS on Tursa.</p> QoS Max Nodes Per Job Max Walltime Jobs Queued Jobs Running Partition(s) Notes standard 64 48 hrs 32 16 gpu, gpu-a100-40, gpu-a100-80, cpu Only jobs sizes that are powers of 2 nodes are allowed (i.e. 1, 2, 4, 8, 16, 32, 64 nodes), only available when your budget is positive. low 64 24 hrs 4 4 gpu, gpu-a100-40, gpu-a100-40, cpu Only jobs sizes that are powers of 2 nodes are allowed (i.e. 1, 2, 4, 8, 16, 32, 64 nodes), only available when your budget is zero or negative dev 2 4 hrs 2 1 gpu For faster turnaround for development jobs and interactive sessions, only available when your budget is positive. The dev QoS must be used with the <code>gpu-a100-40</code> (1-node maximum) or <code>gpu-a100-80</code> (2-node maximum) partitions. <p>You can find out the QoS that you can use by running the following command:</p> <pre><code>sacctmgr show assoc user=$USER cluster=tursa format=cluster,account,user,qos%50\n</code></pre> <p>As long as you have a positive budget, you should use the <code>standard</code> QoS. Once you have exhausted your budget you can use the <code>low</code> QoS to continue to run jobs at a lower priority than jobs in the  <code>standard</code> QoS.</p> <p>Hint</p> <p>If you have needs which do not fit within the current QoS, please contact the Service Desk and we can discuss how to accommodate your requirements.</p> <p>Important</p> <p>Only jobs sizes that are powers of 2 nodes are allowed. i.e. 1, 2, 4, 8, 16, 32, 64 nodes on the <code>gpu</code> partition and  1, 2, 4 nodes on the <code>cpu</code> partition.</p>"},{"location":"tursa-user-guide/scheduler/#priority","title":"Priority","text":"<p>Job priority on Tursa depends on a number of different factors:</p> <ul> <li>The QoS your job has specified</li> <li>The amount of time you have been queuing for</li> <li>Your current fairshare factor</li> </ul> <p>Each of these factors is normalised to a value between 0 and 1, is multiplied with a weight and the resulting values combined to produce a priority for the job.  The current job priority formula on Tursa is:</p> <pre><code>Priority = [10000 * P(QoS)] + [500 * P(Age)] + [300 * P(Fairshare)]\n</code></pre> <p>The priority factors are:</p> <ul> <li>P(QoS) - The QoS priority normalised to a value between 0 and 1. The maximum raw   value is 10000 and the minimum is 0. <code>standard</code> QoS has a value of 5000 and <code>low</code>   QoS a value of 1.</li> <li>P(Age) - The priority based on the job age normalised to a value between 0 and 1.   The maximum raw value is 14 days (where P(Age) = 1).</li> <li>P(Fairshare) - The fairshare priority normalised to a value between 0 and 1. Your   fairshare priority is determined by a combination of your budget code fairshare    value and your user fairshare value within that budget code. The more use that    the budget code you are using has made of the system recently relative to other    budget codes on the system, the lower the budget code fairshare value will be; and the more   use you have made of the system recently relative to other users within your   budget code, the lower your user fairshare value will be. The decay half life    for fairshare on Tursa is set to 14 days. More information on the Slurm fairshare   algorithm.</li> </ul> <p>You can view the priorities for current queued jobs on the system with the <code>sprio</code> command:</p> <pre><code>[dc-user1@tursa-login1 ~]$ sprio \n          JOBID PARTITION   PRIORITY       SITE        AGE  FAIRSHARE        QOS\n          43963 gpu             5055          0         51          5       5000\n          43975 gpu             5061          0         41         20       5000\n          43976 gpu             5061          0         41         20       5000\n          43982 gpu             5046          0         26         20       5000\n          43986 gpu             5011          0          6          5       5000\n          43996 gpu             5020          0          0         20       5000\n          43997 gpu             5020          0          0         20       5000\n</code></pre>"},{"location":"tursa-user-guide/scheduler/#troubleshooting","title":"Troubleshooting","text":""},{"location":"tursa-user-guide/scheduler/#slurm-error-messages","title":"Slurm error messages","text":"<p>An incorrect submission will cause Slurm to return an error. Some common problems are listed below, with a suggestion about the likely cause:</p> <ul> <li> <p><code>sbatch: unrecognized option &lt;text&gt;</code></p> <p>One of your options is invalid or has a typo. <code>man sbatch</code> to help.</p> </li> <li> <p><code>error: Batch job submission failed: No partition specified or system default partition</code></p> <p>A <code>--partition=</code> option is missing. You must specify the partition (see the list above). This is most often <code>--partition=standard</code>.</p> </li> <li> <p><code>error: invalid partition specified: &lt;partition&gt;</code></p> <p><code>error: Batch job submission failed: Invalid partition name specified</code></p> <p>Check the partition exists and check the spelling is correct.</p> </li> <li> <p><code>error: Batch job submission failed: Invalid account or account/partition combination specified</code></p> <p>This probably means an invalid account has been given. Check the <code>--account=</code> options against valid accounts in SAFE.</p> </li> <li> <p><code>error: Batch job submission failed: Invalid qos specification</code></p> <p>A QoS option is either missing or invalid. Check the script has a <code>--qos=</code> option and that the option is a valid one from the table above. (Check the spelling of the QoS is correct.)</p> </li> <li> <p><code>error: Your job has no time specification (--time=)...</code></p> <p>Add an option of the form <code>--time=hours:minutes:seconds</code> to the submission script. E.g., <code>--time=01:30:00</code> gives a time limit of 90 minutes.</p> </li> <li> <p><code>error: QOSMaxWallDurationPerJobLimit</code> <code>error: Batch job submission failed: Job violates accounting/QOS policy</code> <code>(job submit limit, user's size and/or time limits)</code></p> <p>The script has probably specified a time limit which is too long for the corresponding QoS. E.g., the time limit for the short QoS is 20 minutes.</p> </li> </ul>"},{"location":"tursa-user-guide/scheduler/#slurm-queued-reasons","title":"Slurm queued reasons","text":"<p>The <code>squeue</code> command allows users to view information for jobs managed by Slurm. Jobs typically go through the following states: PENDING, RUNNING, COMPLETING, and COMPLETED. The first table provides a description of some job state codes. The second table provides a description of the reasons that cause a job to be in a state.</p> Status Code Description PENDING PD Job is awaiting resource allocation. RUNNING R Job currently has an allocation. SUSPENDED S Job currently has an allocation. COMPLETING CG Job is in the process of completing. Some processes on some nodes may still be active. COMPLETED CD Job has terminated all processes on all nodes with an exit code of zero. TIMEOUT TO Job terminated upon reaching its time limit. STOPPED ST Job has an allocation, but execution has been stopped with SIGSTOP signal. CPUS have been retained by this job. OUT_OF_MEMORY OOM Job experienced out of memory error. FAILED F Job terminated with non-zero exit code or other failure condition. NODE_FAIL NF Job terminated due to failure of one or more allocated nodes. CANCELLED CA Job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated. <p>For a full list of see Job State Codes.</p> Reason Description Priority One or more higher priority jobs exist for this partition or advanced reservation. Resources The job is waiting for resources to become available. BadConstraints The job's constraints can not be satisfied. BeginTime The job's earliest start time has not yet been reached. Dependency This job is waiting for a dependent job to complete. Licenses The job is waiting for a license. WaitingForScheduling No reason has been set for this job yet. Waiting for the scheduler to determine the appropriate reason. Prolog Its PrologSlurmctld program is still running. JobHeldAdmin The job is held by a system administrator. JobHeldUser The job is held by the user. JobLaunchFailure The job could not be launched. This may be due to a file system problem, invalid program name, etc. NonZeroExitCode The job terminated with a non-zero exit code. InvalidAccount The job's account is invalid. InvalidQOS The job's QOS is invalid. QOSUsageThreshold Required QOS threshold has been breached. QOSJobLimit The job's QOS has reached its maximum job count. QOSResourceLimit The job's QOS has reached some resource limit. QOSTimeLimit The job's QOS has reached its time limit. NodeDown A node required by the job is down. TimeLimit The job exhausted its time limit. ReqNodeNotAvail Some node specifically required by the job is not currently available. The node may currently be in use, reserved for another job, in an advanced reservation, DOWN, DRAINED, or not responding. Nodes which are DOWN, DRAINED, or not responding will be identified as part of the job's \"reason\" field as \"UnavailableNodes\". Such nodes will typically require the intervention of a system administrator to make available. <p>For a full list of see Job Reasons.</p>"},{"location":"tursa-user-guide/scheduler/#output-from-slurm-jobs","title":"Output from Slurm jobs","text":"<p>Slurm places standard output (STDOUT) and standard error (STDERR) for each job in the file <code>slurm_&lt;JobID&gt;.out</code>. This file appears in the job's working directory once your job starts running.</p> <p>Hint</p> <p>Output may be buffered - to enable live output, e.g. for monitoring job status, add <code>--unbuffered</code> to the <code>srun</code> command in your SLURM script.</p>"},{"location":"tursa-user-guide/scheduler/#specifying-resources-in-job-scripts","title":"Specifying resources in job scripts","text":"<p>You specify the resources you require for your job using directives at the top of your job submission script using lines that start with the directive <code>#SBATCH</code>.</p> <p>Hint</p> <p>Most options provided using <code>#SBATCH</code> directives can also be specified as command line options to <code>srun</code>.</p> <p>If you do not specify any options, then the default for each option will be applied. As a minimum, all job submissions must specify the budget that they wish to charge the job too with the option:</p> <ul> <li><code>--account=&lt;budgetID&gt;</code> your budget ID is usually something like      <code>t01</code> or <code>t01-test</code>. You can see which budget codes you can charge      to in SAFE.</li> </ul> <p>Other common options that are used are:</p> <ul> <li><code>--time=&lt;hh:mm:ss&gt;</code> the maximum walltime for your job. e.g. For      a 6.5 hour walltime, you would use <code>--time=6:30:0</code>.</li> <li><code>--job-name=&lt;jobname&gt;</code> set a name for the job to help identify it      in</li> </ul> <p>In addition, parallel jobs will also need to specify how many nodes, parallel processes and threads they require.</p> <ul> <li><code>--nodes=&lt;nodes&gt;</code> the number of nodes to use for the job.</li> <li><code>--tasks-per-node=&lt;processes per node&gt;</code> the number of parallel      processes (e.g. MPI ranks) per node. For Grid on GPU nodes this will      typically be 4 to give 1 MPI process per GPU. The CPU nodes have 128      cores per node.</li> <li><code>--cpus-per-task=&lt;stride between processes&gt;</code> for Grid jobs on GPU nodes      where you typically use 1 MPI process per GPU, 4 per node, this will      usually be 12 (so that the 48 cores on a node are evenly divided between      the 4 MPI processes)</li> <li><code>--gres=gpu:4</code> the number of GPU to use per node. This will almost always      be 4 to use all GPUs on a node. (This option should not be specified for      jobs on the CPU nodes.)</li> </ul> <p>If you are happy to have any GPU type for your job (A100-40 or A100-80) then you select the <code>gpu</code> partition:</p> <ul> <li><code>--partition=gpu</code></li> </ul> <p>If you wish to use just the A100-80 GPU nodes which have higher memory, you add the following option:</p> <ul> <li><code>--partition=gpu-a100-80</code> request the job is placed on nodes with high-memory    (80 GB) GPUs - there are 64 high memory GPU nodes on the system. </li> </ul> <p>To just use the A100-40 GPU nodes:</p> <ul> <li><code>--partition=gpu-a100-40</code> request the job is placed on nodes with standard memory    (40 GB) GPUs.</li> </ul> <p>If you do not specfy a partition, the scheduler may use any available node types for  the job (equivalent of <code>--partition=gpu</code>).</p> <p>Note</p> <p>For parallel jobs, Tursa operates in a node exclusive way. This means that you are assigned resources in the units of full compute nodes for your jobs (i.e. 32 cores and 4 GPU on GPU A100-40 nodes, 48 cores and 4 GPU on A100-80 nodes, 128 cores on CPU nodes) and that no other user can share those compute nodes with you. Hence, the minimum amount of resource you can request for a parallel job is 1 node (or 32 cores and 4 GPU on GPU A100-40 nodes, 48 cores and 4 GPU on A100-80 nodes, 128 cores on CPU nodes).</p> <p>To prevent the behaviour of batch scripts being dependent on the user environment at the point of submission, the option</p> <ul> <li><code>--export=none</code> prevents the user environment from being exported      to the batch system.</li> </ul> <p>Using the <code>--export=none</code> means that the behaviour of batch submissions should be repeatable. We strongly recommend its use, although see the following section to enable access to the usual modules.</p>"},{"location":"tursa-user-guide/scheduler/#gpu-frequency","title":"GPU frequency","text":"<p>Important</p> <p>The default GPU frequency on Tursa compute nodes was changed from 1410 MHz to 1040 MHz on Thursday 15 Dec 2022 to improve the energy efficiency of the service.</p> <p>Users can control the GPU frequency in their job submission scripts:</p> <ul> <li><code>--gpu-freq=&lt;desired GPU freq in MHz&gt;</code> allows users to set the GPU frequency       on a per job basis. The frequency can be set in the range 210 - 1410 MHz in steps      of 15 MHz.</li> </ul> <p>Bug</p> <p>When setting the GPU frequency you will see an error in the output from the job  that says <code>control disabled</code>. This is an incorrect message due to an issue with  how Slurm sets the GPU frequency and can be safely ignored.</p>"},{"location":"tursa-user-guide/scheduler/#srun-launching-parallel-jobs","title":"<code>srun</code>: Launching parallel jobs","text":"<p>If you are running parallel jobs, your job submission script should contain one or more srun commands to launch the parallel executable across the compute nodes. In most cases you will want to add the options --distribution=block:block and --hint=nomultithread to your srun command to ensure you get the correct pinning of processes to cores on a compute node.</p> <p>A brief explanation of these options:  - <code>--hint=nomultithread</code> - do not use hyperthreads/SMP  - <code>--distribution=block:block</code> - the first <code>block</code> means use a block distribution    of processes across nodes (i.e. fill nodes before moving onto the next one) and    the second <code>block</code> means use a block distribution of processes across \"sockets\"    within a node (i.e. fill a \"socket\" before moving on to the next one).</p> <p>Important</p> <p>The Slurm definition of a \"socket\" does not usually correspond to a physical CPU socket. On Tursa GPU nodes it corresponds to half the cores on a socket as the GPU nodes are configured with NPS2.</p> <p>On the Tursa CPU nodes, the Slurm definition of a scoket does correspond to a physical CPU socket (64 cores) as the COU nodes are configured with NPS1.</p>"},{"location":"tursa-user-guide/scheduler/#example-job-submission-scripts","title":"Example job submission scripts","text":""},{"location":"tursa-user-guide/scheduler/#example-job-submission-script-for-a-parallel-job-using-cuda","title":"Example: job submission script for a parallel job using CUDA","text":"<p>A job submission script for a parallel job that uses 4 compute nodes, 4 MPI processes per node and 4 GPUs per node. It does not restrict what type of GPU the job can run on so both A100-40 and A100-80 can be used:</p> <pre><code>#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=Example_Grid_job\n#SBATCH --time=12:0:0\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=4\n#SBATCH --cpus-per-task=8\n#SBATCH --gres=gpu:4\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]             \n\n# Load the correct modules\nmodule load /home/y07/shared/tursa-modules/setup-env\nmodule load gcc/9.3.0\nmodule load cuda/12.3\nmodule load openmpi/4.1.5-cuda12.3 \n\nexport OMP_NUM_THREADS=8\n\n# These will need to be changed to match the actual application you are running\napplication=\"my_mpi_openmp_app.x\"\noptions=\"arg 1 arg2\"\n\nsrun --hint=nomultithread --distribution=block:block \\\n     gpu_launch.sh \\\n     ${application} ${options}\n</code></pre> <p>This will run your executable \"my_mpi_opnemp_app.x\" in parallel usimg 16 MPI processes on 4 nodes. 4 GPUs will be used per node.</p> <p>Important</p> <p>You must use the <code>gpu_launch.sh</code> wrapper script to get the correct biniding of GPU to MPI processes and of network interface to GPU and MPI process. This script is described in more detail below.</p>"},{"location":"tursa-user-guide/scheduler/#gpu_launchsh-wrapper-script","title":"<code>gpu_launch.sh</code> wrapper script","text":"<p>The <code>gpu_launch.sh</code> wrapper script is required to set the correct binding of GPU to MPI processes and the correct binding of interconnect interfaces to  MPI process and GPU. We provide this centrally for convenience but its contents are simple:</p> <pre><code>#!/bin/bash\n\n# Compute the raw process ID for binding to GPU and NIC\nlrank=$((SLURM_PROCID % SLURM_NTASKS_PER_NODE))\n\n# Bind the process to the correct GPU and NIC\nexport CUDA_VISIBLE_DEVICES=${lrank}\nexport UCX_NET_DEVICES=mlx5_${lrank}:1\n\n$@\n</code></pre>"},{"location":"tursa-user-guide/scheduler/#using-the-dev-qos","title":"Using the <code>dev</code> QoS","text":"<p>The <code>dev</code> QoS is designed for faster turnaround of short jobs than is usually available through the production QoS. It is subject to a number of restrictions:</p> <ul> <li>4 hour maximum walltime</li> <li>Maximum job size:<ul> <li>2 nodes for <code>gpu-a100-80</code> partition</li> <li>1 node for <code>gpu-a100-40</code> partition</li> </ul> </li> <li>Maximum 1 job running per user</li> <li>Maximum 2 jobs queued per user</li> <li>Only available to projects with a positive budget</li> </ul> <p>In addtion, you must specify either the <code>gpu-a100-80</code> or <code>gpu-a100-40</code> partitions when using the <code>dev</code> QoS.</p> <p>Tip</p> <p>The generic <code>gpu</code> partition will not work consistently when using the <code>dev</code> QoS.</p> <p>Here is an example job submission script for a 2-node job in the <code>dev</code> QoS using the <code>gpu-a100-80</code>  partition. Note the use of the <code>gpu_launch.sh</code> wrapper script to get correct GPU and NIC binding.</p> <pre><code>#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=Example_Dev_Job\n#SBATCH --time=12:0:0\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=4\n#SBATCH --cpus-per-task=12\n#SBATCH --gres=gpu:4\n#SBATCH --partition=gpu-a100-80\n#SBATCH --qos=dev\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n\nexport OMP_NUM_THREADS=1\n\n# Load the correct modules\nmodule load /home/y07/shared/tursa-modules/setup-env\nmodule load gcc/9.3.0\nmodule load cuda/12.3\nmodule load openmpi/4.1.5-cuda12.3 \n\n# Load the correct modules\nmodule load /home/y07/shared/tursa-modules/setup-env\nmodule load gcc/9.3.0\nmodule load cuda/12.3\nmodule load openmpi/4.1.5-cuda12.3 \n\n# These will need to be changed to match the actual application you are running\napplication=\"my_mpi_openmp_app.x\"\noptions=\"arg 1 arg2\"\n\nsrun --hint=nomultithread --distribution=block:block \\\n     gpu_launch.sh \\\n     ${application} ${options}\n</code></pre>"},{"location":"tursa-user-guide/sw-environment/","title":"Software environment","text":"<p>The software environment on Tursa is primarily controlled through the <code>module</code> command. By loading and switching software modules you control which software and versions are available to you.</p> <p>Information</p> <p>A module is a self-contained description of a software package -- it contains the settings required to run a software package and, usually, encodes required dependencies on other software packages.</p> <p>By default, all users on Tursa start with the default software environment loaded.</p> <p>Software modules on Tursa are provided by both Eviden and by EPCC.</p> <p>In this section, we provide:</p> <ul> <li>A brief overview of the <code>module</code> command</li> <li>A brief description of how the <code>module</code> command manipulates your      environment</li> </ul>"},{"location":"tursa-user-guide/sw-environment/#using-the-module-command","title":"Using the <code>module</code> command","text":"<p>We only cover basic usage of the <code>module</code> command here. For full documentation please see the Linux manual page on modules</p> <p>The <code>module</code> command takes a subcommand to indicate what operation you wish to perform. Common subcommands are:</p> <ul> <li><code>module list [name]</code> - List modules currently loaded in your      environment, optionally filtered by <code>[name]</code></li> <li><code>module avail [name]</code> - List modules available, optionally      filtered by <code>[name]</code></li> <li><code>module savelist</code> - List module collections available (usually      used for accessing different programming environments)</li> <li><code>module restore name</code> - Restore the module collection called      <code>name</code> (usually used for setting up a programming environment)</li> <li><code>module load name</code> - Load the module called <code>name</code> into your      environment</li> <li><code>module remove name</code> - Remove the module called <code>name</code> from your      environment</li> <li><code>module swap old new</code> - Swap module <code>new</code> for module <code>old</code> in your      environment</li> <li><code>module help name</code> - Show help information on module <code>name</code></li> <li><code>module show name</code> - List what module <code>name</code> actually does to your      environment</li> </ul> <p>These are described in more detail below.</p>"},{"location":"tursa-user-guide/sw-environment/#information-on-the-available-modules","title":"Information on the available modules","text":"<p>The <code>module list</code> command will give the names of the modules and their versions you have presently loaded in your environment. By default, you will have no modules loaded when you first log into Tursa</p> <p>Finding out which software modules are available on the system is performed using the <code>module avail</code> command. To list all software modules available, use:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module avail\n------------------------------------------ /mnt/lustre/tursafs1/apps/cuda-11.0.2-modulefiles -------------------------------------------\ncuda/11.0.2  openmpi/4.1.1-cuda11.0.2  ucx/1.10.1-cuda11.0.2  \n\n------------------------------------------- /mnt/lustre/tursafs1/apps/cuda-11.4-modulefiles --------------------------------------------\ncuda/11.4.1  openmpi/4.1.1-cuda11.4  ucx/1.12.0-cuda11.4  \n\n------------------------------------------ /mnt/lustre/tursafs1/apps/cuda-11.4.1-modulefiles -------------------------------------------\ncuda/11.4.1  openmpi/4.1.1-cuda11.4.1  ucx/1.12.0-cuda11.4.1  \n\n------------------------------------------------ /mnt/lustre/tursafs1/apps/modulefiles -------------------------------------------------\ncuda/11.0.3  dot  gcc/9.3.0  module-git  module-info  modules  null  openmpi/4.1.1  ucx/1.10.1  use.own  xpmem/2.6.5   \n</code></pre> <p>This will list all the names and versions of the modules available on the service. Not all of them may work in your account though due to, for example, licencing restrictions. You will notice that for many modules we have more than one version, each of which is identified by a version number. One of these versions is the default. As the service develops the default version will change and old versions of software may be deleted.</p> <p>You can list all the modules of a particular type by providing an argument to the <code>module avail</code> command. For example, to list all available versions of the OpenMPI library, use:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module avail openmpi\n------------------------------------------ /mnt/lustre/tursafs1/apps/cuda-11.0.2-modulefiles -------------------------------------------\nopenmpi/4.1.1-cuda11.0.2  \n\n------------------------------------------- /mnt/lustre/tursafs1/apps/cuda-11.4-modulefiles --------------------------------------------\nopenmpi/4.1.1-cuda11.4  \n\n------------------------------------------ /mnt/lustre/tursafs1/apps/cuda-11.4.1-modulefiles -------------------------------------------\nopenmpi/4.1.1-cuda11.4.1  \n\n----------------\n</code></pre> <p>The <code>module show</code> command reveals what operations the module actually performs to change your environment when it is loaded. We provide a brief overview of what the significance of these different settings mean below. For example, for the default openmpi module:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module show openmpi\n-------------------------------------------------------------------\n/mnt/lustre/tursafs1/apps/cuda-11.0.2-modulefiles/openmpi/4.1.1-cuda11.0.2:\n\nmodule-whatis   Sets up OpenMPI on your environment\nsetenv          MPI_ROOT        /mnt/lustre/tursafs1/apps/basestack/cuda-11.0.2/openmpi/4.1.1/\nprepend-path    PATH /mnt/lustre/tursafs1/apps/basestack/cuda-11.0.2/openmpi/4.1.1/bin/\nprepend-path    LD_LIBRARY_PATH /mnt/lustre/tursafs1/apps/basestack/cuda-11.0.2/openmpi/4.1.1/lib\nprepend-path    MANPATH /opt/mpi/openmpi/4.0.4.1/share/man\nmodule load     ucx/1.10.1\nsetenv          OMPI_CC cc\nsetenv          OMPI_CXX        g++\nsetenv          OMPI_CFLAGS     -g -m64\nsetenv          OMPI_CXXFLAGS   -g -m64\n-------------------------------------------------------------------\n</code></pre>"},{"location":"tursa-user-guide/sw-environment/#loading-removing-and-swapping-modules","title":"Loading, removing and swapping modules","text":"<p>To load a module to use the <code>module load</code> command. For example, to load the default version of OpenMPI into your environment, use:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module load openmpi\n\n        UCX 1.10 loaded\n\n\n        OpenMPI 4.1.1 loaded\n</code></pre> <p>Once you have done this, your environment will be setup to use the OpenMPI library. The above command will load the default version of OpenMPI. If you need a specific version of the software, you can add more information:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module load openmpi/4.1.1-cuda11.4.1\n\n        UCX 1.12.0 compiled with cuda 11.4.1 loaded\n\n\n        OpenMPI 4.1.1 with cuda-11.4.1 and UCX 1.12.0  loaded\n</code></pre> <p>will load OpenMPI version 4.1.1 with CUDA 11.4.1 into your environment, regardless of the default.</p> <p>If you want to remove software from your environment, <code>module rm</code> will remove a loaded module:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module rm openmpi\n</code></pre> <p>will unload what ever version of <code>openmpi</code> (even if it is not the default) you might have loaded.</p> <p>There are many situations in which you might want to change the presently loaded version to a different one, such as trying the latest version which is not yet the default or using a legacy version to keep compatibility with old data. This can be achieved most easily by using <code>module swap oldmodule newmodule</code>.</p> <p>Suppose you have loaded version 4.1.1 of <code>openmpi</code>, the following command will change to version 4.1.1-cuda11.4.1:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module swap openmpi openmpi/4.1.1-cuda11.4.1\n\n        UCX 1.12.0 compiled with cuda 11.4.1 loaded\n\n\n        OpenMPI 4.1.1 with cuda-11.4.1 and UCX 1.12.0  loaded\n</code></pre> <p>You did not need to specify the version of the loaded module in your current environment as this can be inferred as it will be the only one you have loaded.</p>"},{"location":"tursa-user-guide/sw-environment/#capturing-your-environment-for-reuse","title":"Capturing your environment for reuse","text":"<p>Sometimes it is useful to save the module environment that you are using to compile a piece of code or execute a piece of software. This is saved as a module collection. You can save a collection from your current environment by executing:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module save [collection_name]\n</code></pre> <p>Note</p> <p>If you do not specify the environment name, it is called <code>default</code>.</p> <p>You can find the list of saved module environments by executing:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module savelist\nNamed collection list:\n 1) default\n</code></pre> <p>To list the modules in a collection, you can execute, e.g.,:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module saveshow default\n-------------------------------------------------------------------\n/home/z01/z01/dc-turn1/.module/default:\n\nmodule use --append /mnt/lustre/tursafs1/apps/cuda-11.0.2-modulefiles\nmodule use --append /mnt/lustre/tursafs1/apps/cuda-11.4-modulefiles\nmodule use --append /mnt/lustre/tursafs1/apps/cuda-11.4.1-modulefiles\nmodule use --append /mnt/lustre/tursafs1/apps/modulefilesintel\nmodule use --append /mnt/lustre/tursafs1/apps/modulefiles\nmodule load ucx/1.12.0-cuda11.4.1\nmodule load openmpi/4.1.1-cuda11.4.1\n\n-------------------------------------------------------------------\n</code></pre> <p>Note again that the details of the collection have been saved to the home directory (the first line of output above). It is possible to save a module collection with a fully qualified path, e.g.,</p> <pre><code>[dc-user1@tursa-login1 ~]$ module save /home/t01/z01/auser/my-module-collection\n</code></pre> <p>if you want to save to a specific file name.</p> <p>To delete a module environment, you can execute:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module saverm &lt;environment_name&gt;\n</code></pre>"},{"location":"tursa-user-guide/sw-environment/#shell-environment-overview","title":"Shell environment overview","text":"<p>When you log in to Tursa, you are using the bash shell by default. As any other software, the bash shell has loaded a set of environment variables that can be listed by executing <code>printenv</code> or <code>export</code>.</p> <p>The environment variables listed before are useful to define the behaviour of the software you run. For instance, <code>OMP_NUM_THREADS</code> define the number of threads.</p> <p>To define an environment variable, you need to execute:</p> <pre><code>export OMP_NUM_THREADS=4\n</code></pre> <p>Please note there are no blanks between the variable name, the assignation symbol, and the value. If the value is a string, enclose the string in double quotation marks.</p> <p>You can show the value of a specific environment variable if you print it:</p> <pre><code>echo $OMP_NUM_THREADS\n</code></pre> <p>Do not forget the dollar symbol. To remove an environment variable, just execute:</p> <pre><code>unset OMP_NUM_THREADS\n</code></pre>"},{"location":"tursa-user-guide/sw-environment/#compiler-environment","title":"Compiler environment","text":"<p>The system supports two different primary compiler environments:</p> <ul> <li>GCC toolchain: GCC, CUDA 11.4, OpenMPI 4.1.1</li> <li>NVHPC toolchain: NVHPC 21.7, OpenMPI 4.1.1</li> </ul>"},{"location":"tursa-user-guide/sw-environment/#gcc-toolchain","title":"GCC toolchain","text":"<p>To compile on the system for GPU nodes using the GCC toolchain, you would typically load the required modules:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module load gcc/9.3.0\n[dc-user1@tursa-login1 ~]$ module load cuda/11.4.1 \n[dc-user1@tursa-login1 ~]$ module load openmpi/4.1.1-cuda11.4\n[dc-user1@tursa-login1 ~]$ module list\nCurrently Loaded Modulefiles:\n 1) gcc/9.3.0   2) cuda/11.4.1   3) ucx/1.12.0-cuda11.4   4) openmpi/4.1.1-cuda11.4 \n</code></pre> <p>Once you have loaded the modules, the standard OpenMPI compiler wrapper scripts are available:</p> <ul> <li><code>mpicc</code></li> <li><code>mpicxx</code></li> <li><code>mpif90</code></li> </ul> <p>You can find more information on these scripts in the OpenMPI documentation.</p>"},{"location":"tursa-user-guide/sw-environment/#nvhpc-toolchain","title":"NVHPC toolchain","text":"<p>To compile on the system for GPU nodes using the GCC toolchain, you would typically load the required modules:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module load /home/y07/shared/tursa-modules/setup-env\n[dc-user1@tursa-login1 ~]$ module load gcc/9.3.0\n[dc-user1@tursa-login1 ~]$ module load nvhpc/21.7-nompi\n[dc-user1@tursa-login1 ~]$ module load openmpi/4.1.1-cuda11.4\n[dc-user1@tursa-login1 ~]$ module list\nCurrently Loaded Modulefiles:\n 1) /mnt/lustre/tursafs1/home/y07/shared/tursa-modules/setup-env   2) nvhpc/21.7-nompi\n 3) ucx/1.12.0-cuda11.4   4) openmpi/4.1.1-cuda11.4    5) gcc/9.3.0\n</code></pre> <p>Once you have loaded the modules, the standard OpenMPI compiler wrapper scripts are available:</p> <ul> <li><code>mpicc</code></li> <li><code>mpicxx</code></li> <li><code>mpif90</code></li> </ul> <p>and the NVIDIA compilers are available as:</p> <ul> <li><code>nvcc</code></li> <li><code>nvc++</code></li> <li><code>nvfortran</code></li> </ul> <p>Tip</p> <p>Both the NVIDIA compilers and the MPI compiler wrapper scripts will use the GCC compilers directly in the default configuration - this is often what you want. If you want the compiler wrappers to call the NVIDIA compilers themselves rather than  GCC directly, you would use:</p> <pre><code>export OMPI_CC=nvcc\nexport OMPI_CXX=nvc++\nexport OMPI_FC=nvfortran\n</code></pre>"},{"location":"tursa-user-guide/sw-environment/#other-build-tools","title":"Other build tools","text":""},{"location":"tursa-user-guide/sw-environment/#cmake","title":"cmake","text":"<p>CMake is available by using the commands:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module load /home/y07/shared/tursa-modules/setup-env\n[dc-user1@tursa-login1 ~]$ module load cmake\n</code></pre>"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"DiRAC Extreme Scaling User Documentation","text":"<p>DiRAC Extreme Scaling is part of the DiRAC National HPC Service. You can find more information on the service and the research it supports on the DiRAC website.</p> <p>The DiRAC Extreme Scaling service is an HPC resource for UK researchers. DiRAC Extreme Scaling is provided by UKRI, EPCC and the University of Edinburgh. The hardware is provided by ATOS.</p>"},{"location":"#what-the-documentation-covers","title":"What the documentation covers","text":"<p>The documentation currently includes:</p> <ul> <li>Tursa User Guide     Covers all aspects of use of the Tursa resource.     This includes fundamentals (required by all users to use the system     effectively), best practice for getting the most out of Tursa, and     other advanced technical topics.</li> </ul>"},{"location":"#contributing-to-the-documentation","title":"Contributing to the documentation","text":"<p>The source for this documentation is publicly available in the DiRAC Extreme Scaling documentation Github repository so that anyone can contribute to improve the documentation for the service. Contributions can be in the form of improvements or addtions to the content and/or addtion of Issues providing suggestions for how it can be improved.</p> <p>Full details of how to contribute can be found in the <code>README.md</code> file of the repository.</p>"},{"location":"tursa-user-guide/","title":"Tursa User Guide","text":"<p>The Tursa User Guide covers all aspects of use of the Tursa resource. This includes fundamentals (required by all users to use the system effectively), best practice for getting the most out of Tursa and more technical topics.</p> <p>The Tursa User Guide contains the following sections:</p> <ul> <li>Connecting to Tursa</li> <li>Data management and transfer</li> <li>Software environment</li> <li>Running jobs on Tursa</li> </ul>"},{"location":"tursa-user-guide/connecting/","title":"Connecting to Tursa","text":"<p>On the Tursa system, interactive access can be achieved via SSH, either directly from a command line terminal or using an SSH client. In addition data can be transferred to and from the Tursa system using <code>scp</code> from the command line or by using a file transfer client.</p> <p>This section covers the basic connection methods.</p> <p>Before following the process below, we assume you have setup an account on Tursa through the DiRAC SAFE. Documentation on how to do this can be found at:</p> <ul> <li>SAFE Guide for Users</li> </ul>"},{"location":"tursa-user-guide/connecting/#command-line-terminal","title":"Command line terminal","text":""},{"location":"tursa-user-guide/connecting/#linux","title":"Linux","text":"<p>Linux distributions come installed with a terminal application that can be used for SSH access to the login nodes. Linux users will have different terminals depending on their distribution and window manager (e.g. GNOME Terminal in GNOME, Konsole in KDE). Consult your Linux distribution's documentation for details on how to load a terminal.</p>"},{"location":"tursa-user-guide/connecting/#macos","title":"MacOS","text":"<p>MacOS users can use the Terminal application, located in the Utilities folder within the Applications folder.</p>"},{"location":"tursa-user-guide/connecting/#windows","title":"Windows","text":"<p>A typical Windows installation will not include a terminal client, though there are various clients available. We recommend all our Windows users to download and install MobaXterm to access Tursa. It is very easy to use and includes an integrated X server with SSH client to run any graphical applications on Tursa.</p> <p>You can download MobaXterm Home Edition (Installer Edition) from the following link:</p> <ul> <li>Install MobaXterm</li> </ul> <p>Double-click the downloaded Microsoft Installer file (.msi), and the Windows wizard will automatically guides you through the installation process. Note, you might need to have administrator rights to install on some Windows OS. Also make sure to check whether Windows Firewall hasn't blocked any features of this program after installation.</p> <p>Start MobaXterm and then click \"Start local terminal\"</p> <p>Tips</p> <ul> <li> <p>If you download the .zip file rather than the .msi, make sure you unzip before attempting to run the installer.</p> </li> <li> <p>If you are using a \"managed desktop\" machine, so do not have admin rights, you can use the Portable edition of MobaXterm which doesn't need install privilages.</p> </li> <li> <p>If this is your first time using MobaXterm, check that a permanent /home directory has been set up (or all saved info will be lost from session to session). Go to \"Settings\" -&gt; \"Configuration\"-&gt; check path to \"Persistent home directory\" is set and make sure path is \"private\" if prompted.</p> </li> <li> <p>Any ssh key generated in MobaXterm will, by default, be stored in the permanent /home directory (see above) i.e. if your /home directory is <code>_MyDocuments_\\MobaXterm\\home</code> then within that folder you will find <code>_MyDocuments_\\MobaXterm\\home\\.ssh</code> with your keys.  This folder will be 'hidden' by default so you may need to tick 'Hidden items' under 'View' in Windows Explorer to see it.</p> </li> <li> <p>MobaXterm also allows you to set up ssh sessions with the username, login host and key details saved.  You are welcome to use this, rather than using the \"Local terminal\" but we are not able to assist with debugging connection issues if you choose this method.  We recommend sticking to command line terminal access.</p> </li> </ul>"},{"location":"tursa-user-guide/connecting/#access-credentials","title":"Access credentials","text":"<p>To access Tursa, you need to use two credentials:</p> <ul> <li>Before 13 Feb 2024: your password and an SSH key pair protected by a passphrase.</li> <li>After 13 Feb 2024: An SSH key pair protected by a passphrase and a Time-based One Time Passcode (TOTP)</li> </ul> <p>You can find more detailed instructions on how to set up your credentials to access Tursa from Windows, macOS and Linux below.</p>"},{"location":"tursa-user-guide/connecting/#ssh-key-pairs","title":"SSH Key Pairs","text":"<p>You will need to generate an SSH key pair protected by a passphrase to access Tursa.</p> <p>Using a terminal (the command line), set up a key pair that contains your e-mail address and enter a passphrase you will use to unlock the key:</p> <pre><code>$ ssh-keygen -t rsa -C \"your@email.com\"\n...\n-bash-4.1$ ssh-keygen -t rsa -C \"your@email.com\"\nGenerating public/private rsa key pair.\nEnter file in which to save the key (/Home/user/.ssh/id_rsa): [Enter]\nEnter passphrase (empty for no passphrase): [Passphrase]\nEnter same passphrase again: [Passphrase]\nYour identification has been saved in /Home/user/.ssh/id_rsa.\nYour public key has been saved in /Home/user/.ssh/id_rsa.pub.\nThe key fingerprint is:\n03:d4:c4:6d:58:0a:e2:4a:f8:73:9a:e8:e3:07:16:c8 your@email.com\nThe key's randomart image is:\n+--[ RSA 2048]----+\n|    . ...+o++++. |\n| . . . =o..      |\n|+ . . .......o o |\n|oE .   .         |\n|o =     .   S    |\n|.    +.+     .   |\n|.  oo            |\n|.  .             |\n| ..              |\n+-----------------+\n</code></pre> <p>(remember to replace \"your@email.com\" with your e-mail address).</p>"},{"location":"tursa-user-guide/connecting/#upload-public-part-of-key-pair-to-safe","title":"Upload public part of key pair to SAFE","text":"<p>You should now upload the public part of your SSH key pair to the SAFE by following the instructions at:</p> <p>Login to SAFE. Then:</p> <ol> <li>Go to the Menu Login accounts and select the Tursa account you      want to add the SSH key to</li> <li>On the subsequent Login account details page click the Add      Credential button</li> <li>Select SSH public key as the Credential Type and click Next</li> <li>Either copy and paste the public part of your SSH key into the      SSH Public key box or use the button to select the public key      file on your computer.</li> <li>Click Add to associate the public SSH key part with your account</li> </ol> <p>Once you have done this, your SSH key will be added to your Tursa account.</p> <p>Remember, you will need to use both an SSH key and password to log into Tursa so you will also need to collect your initial password before you can log into Tursa. We cover this next.</p> <p>Note</p> <p>If you want to connect to Tursa from more than one machine, e.g. from your home laptop as well as your work laptop, you should generate an ssh key on each machine, and add each of the public keys into SAFE.  </p>"},{"location":"tursa-user-guide/connecting/#initial-passwords-up-to-13-feb-2024","title":"Initial passwords (up to 13 Feb 2024)","text":"<p>The SAFE web interface is used to provide your initial password for logging onto Tursa (see the SAFE Documentation for more details on requesting accounts and picking up passwords).</p> <p>Note</p> <p>You may now change your password on the Tursa machine itself using the passwd command or when you are prompted the first time you login. This change will not be reflected in the SAFE. If you forget your password, you should use the SAFE to request a new one-shot password.</p>"},{"location":"tursa-user-guide/connecting/#mfa-time-based-one-time-passcode-totp-from-13-feb-2024","title":"MFA Time-based one-time passcode (TOTP) (from 13 Feb 2024)","text":"<p>You will need to use both an SSH key and time-based one-time passcode to log into Tursa so you will also need to set up a method for generating a TOTP code before you can log into Tursa. </p>"},{"location":"tursa-user-guide/connecting/#first-login-from-a-new-account-password-required","title":"First login from a new account: password required","text":"<p>Important</p> <p>You will not use your password when logging on to Tursa after the first login for a new account.</p> <p>As an additional security measure, you will also need to use a password from SAFE for your first login to Tursa with a new account. When you log into Tursa for the first time with a new account, you will be prompted to change your initial password. This is a three step process:</p> <ol> <li>When promoted to enter your ldap password: Enter the password  which you retrieve from SAFE</li> <li>When prompted to enter your new password: type in a new password</li> <li>When prompted to re-enter the new password: re-enter the new password</li> </ol> <p>Your password has now been changed. You will no longer need this password to log into Tursa from this point forwards, you will use your SSH key and TOTP as described above.</p>"},{"location":"tursa-user-guide/connecting/#ssh-clients","title":"SSH Clients","text":"<p>Interaction with Tursa is done remotely, over an encrypted communication channel, Secure Shell version 2 (SSH-2). This allows command-line access to one of the login nodes of a Tursa, from which you can run commands or use a command-line text editor to edit files. SSH can also be used to run graphical programs such as GUI text editors and debuggers when used in conjunction with an X client.</p>"},{"location":"tursa-user-guide/connecting/#logging-in","title":"Logging in","text":"<p>You can use the following command from the terminal window to login into Tursa:</p> <pre><code>ssh username@tursa.dirac.ed.ac.uk\n</code></pre> <ul> <li>Before 13 Feb 2024: You will first be prompted for your machine account password. Once you have entered your password successfully, you will then be prompted for the passphrase associated with your SSH key pair.</li> <li>After 13 Feb 2024: You will first be prompted for the passphrase associated with your SSH key pair (if it is not already added to a local SSH Agent) and then for your TOTP. </li> </ul> <p>You need to enter both credentials correctly to be able to access Tursa.</p> <p>Tip</p> <p>If your SSH key pair is not stored in the default location (usually <code>~/.ssh/id_rsa</code>) on your local system, you may need to specify the path to the private part of the key wih the <code>-i</code> option to <code>ssh</code>. For example, if your key is in a file called <code>keys/id_rsa_Tursa</code> you would use the command <code>ssh -i keys/id_rsa_Tursa username@login.tursa.ac.uk</code> to log in.</p> <p>Tip</p> <p>When you first log into Tursa, you will be prompted to change your initial password. This is a three step process:</p> <ol> <li>When promoted to enter your ldap password: Re-enter the password     you retrieved from SAFE</li> <li>When prompted to enter your new password: type in a new password</li> <li>When prompted to re-enter the new password: re-enter the new     password</li> </ol> <p>Your password has now been changed</p> <p>To allow remote programs, especially graphical applications to control your local display, such as being able to open up a new GUI window (such as for a debugger), use:</p> <pre><code>ssh -X username@tursa.dirac.ed.ac.uk\n</code></pre> <p>Some sites recommend using the <code>-Y</code> flag. While this can fix some compatibility issues, the <code>-X</code> flag is more secure.</p> <p>Current MacOS systems do not have an X window system. Users should install the XQuartz package to allow for SSH with X11 forwarding on MacOS systems:</p> <ul> <li>XQuartz website</li> </ul>"},{"location":"tursa-user-guide/connecting/#making-access-more-convenient-using-the-ssh-configuration-file","title":"Making access more convenient using the SSH configuration file","text":"<p>Typing in the full command to login or transfer data to Tursa can become tedious as it often has to be repeated many times. You can use the SSH configuration file, usually located on your local machine at <code>.ssh/config</code> to make things a bit more convenient.</p> <p>Each remote site (or group of sites) can have an entry in this file which may look something like:</p> <pre><code>Host tursa\n  HostName tursa.dirac.ed.ac.uk\n  User username\n</code></pre> <p>(remember to replace <code>username</code> with your actual username!).</p> <p>The <code>Host tursa</code> line defines a short name for the entry. In this case, instead of typing <code>ssh username@tursa.dirac.ed.ac.uk</code> to access the Tursa login nodes, you could use <code>ssh tursa</code> instead. The remaining lines define the options for the <code>tursa</code> host.</p> <ul> <li><code>Hostname tursa.dirac.ed.ac.uk</code> - defines the full address of the      host</li> <li><code>User username</code> - defines the username to use by default for this      host (replace <code>username</code> with your own username on the remote      host)</li> </ul> <p>Now you can use SSH to access Tursa without needing to enter your username or the full hostname every time:</p> <pre><code>$ ssh tursa\n</code></pre> <p>You can set up as many of these entries as you need in your local configuration file. Other options are available. See the ssh_config man page (or <code>man ssh_config</code> on any machine with SSH installed) for a description of the SSH configuration file. You may find the <code>IdentityFile</code> option useful if you have to manage multiple SSH key pairs for different systems as this allows you to specify which SSH key to use for each system.</p> <p>Bug</p> <p>There is a known bug with Windows ssh-agent. If you get the error message: <code>Warning: agent returned different signature type ssh-rsa (expected rsa-sha2-512)</code>, you will need to either specify the path to your ssh key in the command line (using the <code>-i</code> option as described above) or add the path to your SSH config file by using the <code>IdentityFile</code> option.</p>"},{"location":"tursa-user-guide/connecting/#ssh-debugging-tips","title":"SSH debugging tips","text":"<p>If you find you are unable to connect via SSH there are a number of ways you can try and diagnose the issue. Some of these are collected below - if you are having difficulties connecting we suggest trying these before contacting the Tursa service desk.</p>"},{"location":"tursa-user-guide/connecting/#use-the-usertursadiracedacuk-syntax-rather-than-l-user-tursadiracedacuk","title":"Use the <code>user@tursa.dirac.ed.ac.uk</code> syntax rather than <code>-l user tursa.dirac.ed.ac.uk</code>","text":"<p>We have seen a number of instances where people using the syntax</p> <pre><code>ssh -l user tursa.dirac.ed.ac.uk\n</code></pre> <p>have not been able to connect properly and get prompted for a password many times. We have found that using the alternative syntax:</p> <pre><code>ssh user@tursa.dirac.ed.ac.uk\n</code></pre> <p>works more reliably. If you are using the <code>-l user</code> option to connect and  are seeing issues, then try using <code>user@tursa.dirac.ed.ac.uk</code> instead.</p>"},{"location":"tursa-user-guide/connecting/#can-you-connect-to-the-login-node","title":"Can you connect to the login node?","text":"<p>Try the command <code>ping -c 3 tursa.dirac.ed.ac.uk</code>. If you successfully connect to the login node, the output should include:</p> <pre><code>--- tursa.dirac.ed.ac.uk ping statistics ---\n3 packets transmitted, 3 received, 0% packet loss, time 38ms\n</code></pre> <p>(the ping time '38ms' is not important). If not all packets are received there could be a problem with your internet connection, or the login node could be unavailable.</p>"},{"location":"tursa-user-guide/connecting/#password","title":"Password","text":"<p>If you are having trouble entering your password consider using a password manager, from which you can copy and paste it. This will also help you generate a secure password. If you need to reset your password, instructions for doing so can be found in the SAFE documentation</p> <p>Windows users please note that <code>Ctrl+V</code> does not work to paste in to PuTTY, MobaXterm, or PowerShell. Instead use <code>Shift+Ins</code> to paste. Alternatively, right-click and select 'Paste' in PuTTY and MobaXterm, or simply right-click to paste in PowerShell.</p>"},{"location":"tursa-user-guide/connecting/#ssh-key","title":"SSH key","text":"<p>If you get the error message <code>Permission denied (publickey)</code> this can indicate a problem with your SSH key. Some things to check:</p> <ul> <li> <p>Have you uploaded the key to SAFE? Please note that if the same      key is reuploaded SAFE will not map the \"new\" key to Tursa. If      for some reason this is required, please delete the key first,      then reupload.</p> </li> <li> <p>Is ssh using the correct key? You can check which keys are being      found and offered by ssh using <code>ssh -vvv</code>. If your private key has      a non-default name you can use the <code>-i</code> flag to provide it to ssh,      i.e. <code>ssh -i path/to/key username@tursa.dirac.ed.ac.uk</code>.</p> </li> <li> <p>Are you entering the passphrase correctly? You will be asked for      your private key's passphrase first. If you enter it incorrectly      you will usually be asked to enter it again, and usually up to      three times in total, after which ssh will fail with <code>Permission      denied (publickey)</code>. If you would like to confirm your passphrase      without attempting to connect, you can use <code>ssh-keygen -y -f      /path/to/private/key</code>. If successful, this command will print the      corresponding public key. You can also use this to check it is the      one uploaded to SAFE.</p> </li> <li> <p>Are permissions correct on the ssh key? One common issue is that      the permissions are incorrect on the either the key file, or the      directory it's contained in. On Linux/MacOS for example, if your      private keys are held in <code>~/.ssh/</code> you can check this with <code>ls -al      ~/.ssh</code>. This should give something similar to the following      output:</p> <pre><code> $ ls -al ~/.ssh/\n drwx------.  2 user group    48 Jul 15 20:24 .\n drwx------. 12 user group  4096 Oct 13 12:11 ..\n -rw-------.  1 user group   113 Jul 15 20:23 authorized_keys\n -rw-------.  1 user group 12686 Jul 15 20:23 id_rsa\n -rw-r--r--.  1 user group  2785 Jul 15 20:23 id_rsa.pub\n -rw-r--r--.  1 user group  1967 Oct 13 14:11 known_hosts\n</code></pre> <p>The important section here is the string of letters and dashes at  the start, for the lines ending in <code>.</code>, <code>id_rsa</code>, and  <code>id_rsa.pub</code>, which indicate permissions on the containing  directory, private key, and public key respectively. If your  permissions are not correct, they can be set with <code>chmod</code>. Consult  the table below for the relevant <code>chmod</code> command. On Windows,  permissions are handled differently but can be set by  right-clicking on the file and selecting Properties &gt; Security &gt;  Advanced. The user, SYSTEM, and Administrators should have <code>Full  control</code>, and no other permissions should exist for both public  and private key files, and the containing folder.</p> </li> </ul> Target Permissions <code>chmod</code> Code Directory <code>drwx------</code> 700 Private Key <code>-rw-------</code> 600 Public Key <code>-rw-r--r--</code> 644 <p><code>chmod</code> can be used to set permissions on the target in the following way: <code>chmod &lt;code&gt; &lt;target&gt;</code>. So for example to set correct permissions on the private key file <code>id_rsa_Tursa</code> one would use the command <code>chmod 600 id_rsa_Tursa</code>.</p> <p>Tip</p> <p>Unix file permissions can be understood in the following way. There are three groups that can have file permissions: (owning) users, (owning) groups, and others. The available permissions are read, write, and execute. The first character indicates whether the target is a file <code>-</code>, or directory <code>d</code>. The next three characters indicate the owning user's permissions. The first character is <code>r</code> if they have read permission, <code>-</code> if they don't, the second character is <code>w</code> if they have write permission, <code>-</code> if they don't, the third character is <code>x</code> if they have execute permission, <code>-</code> if they don't. This pattern is then repeated for group, and other permissions. For example the pattern <code>-rw-r--r--</code> indicates that the owning user can read and write the file, members of the owning group can read it, and anyone else can also read it. The <code>chmod</code> codes are constructed by treating the user, group, and owner permission strings as binary numbers, then converting them to decimal. For example the permission string <code>-rwx------</code> becomes <code>111 000 000</code> -&gt; <code>700</code>.</p>"},{"location":"tursa-user-guide/connecting/#ssh-verbose-output","title":"SSH verbose output","text":"<p>Verbose debugging output from <code>ssh</code> can be very useful for diagnosing the issue. In particular, it can be used to distinguish between problems with the SSH key and password - further details are given below. To enable verbose output add the <code>-vvv</code> flag to your SSH command. For example:</p> <pre><code>ssh -vvv username@tursa.dirac.ed.ac.uk\n</code></pre> <p>The output is lengthy, but somewhere in there you should see lines similar to the following:</p> <pre><code>debug1: Next authentication method: keyboard-interactive\ndebug2: userauth_kbdint\ndebug3: send packet: type 50\ndebug2: we sent a keyboard-interactive packet, wait for reply\ndebug3: receive packet: type 60\ndebug2: input_userauth_info_req\ndebug2: input_userauth_info_req: num_prompts 1\nPassword:\ndebug3: send packet: type 61\ndebug3: receive packet: type 60\ndebug2: input_userauth_info_req\ndebug2: input_userauth_info_req: num_prompts 0\ndebug3: send packet: type 61\ndebug3: receive packet: type 51\nAuthenticated with partial success.\ndebug1: Authentications that can continue: publickey,password\n</code></pre> <p>If you do not see the <code>Password:</code> prompt you may have connection issues, or there could be a problem with the Tursa login nodes. If you do not see <code>Authenticated with partial success</code> it means your password was not accepted. You will be asked to re-enter your password, usually two more times before the connection will be rejected. Consider the suggestions under Password above. If you do see <code>Authenticated with partial success</code>, it means your password was accepted, and your SSH key will now be checked.</p> <p>You should next see something similiar to:</p> <pre><code>debug1: Next authentication method: publickey\ndebug1: Offering public key: RSA SHA256:&lt;key_hash&gt; &lt;path_to_private_key&gt;\ndebug3: send_pubkey_test\ndebug3: send packet: type 50\ndebug2: we sent a publickey packet, wait for reply\ndebug3: receive packet: type 60\ndebug1: Server accepts key: pkalg rsa-sha2-512 blen 2071\ndebug2: input_userauth_pk_ok: fp SHA256:&lt;key_hash&gt;\ndebug3: sign_and_send_pubkey: RSA SHA256:&lt;key_hash&gt;\nEnter passphrase for key '&lt;path_to_private_key&gt;':\ndebug3: send packet: type 50\ndebug3: receive packet: type 52\ndebug1: Authentication succeeded (publickey).\n</code></pre> <p>Most importantly, you can see which files ssh has checked for private keys, and you can see if any key is accepted. The line <code>Authenticated succeeded</code> indicates that the SSH key has been accepted. By default ssh will go through a list of standard private key files, as well as any you have specified with <code>-i</code> or a config file. This is fine, as long as one of the files mentioned is the one that matches the public key uploaded to SAFE.</p> <p>If your SSH key passphrase is incorrect, you will be asked to try again up to three times in total, before being disconnected with <code>Permission denied (publickey)</code>. If you enter your passphrase correctly, but still see this error message, please consider the advice under SSH key above.</p>"},{"location":"tursa-user-guide/data/","title":"Data management and transfer","text":"<p>This section covers best practice and tools for data management on Tursa as well as information on the storgae available on the system.</p> <p>Information</p> <p>If you have any questions on data management and transfer please do not hesitate to contact the DiRAC service desk at dirac-support@epcc.ed.ac.uk.</p>"},{"location":"tursa-user-guide/data/#useful-resources-and-links","title":"Useful resources and links","text":"<ul> <li>Harry Mangalam's guide on How to transfer large amounts of data      via      network.      This provides lots of useful advice on transferring data.</li> </ul>"},{"location":"tursa-user-guide/data/#data-management","title":"Data management","text":"<p>We strongly recommend that you give some thought to how you use the various data storage facilities that are part of the Tursa service. This will not only allow you to use the machine more effectively but also to ensure that your valuable data is protected.</p>"},{"location":"tursa-user-guide/data/#tursa-storage","title":"Tursa storage","text":"<p>Tursa has two different storage systems available:</p> <ul> <li>Parallel Lustre file system - for working, high-performance storage</li> <li>Tape storage - for storing large amounts of data that are not currently required for jobs on the system</li> </ul>"},{"location":"tursa-user-guide/data/#parallel-lustre-file-system","title":"Parallel Lustre file system","text":"<p>The Tursa storage is provided by a parallel Lustre file system that  provides your home directories and working storage. When you log in you will be placed in your home directory. </p> <p>The home directory for each user is located at:</p> <pre><code>/home/[project code]/[group code]/[username]\n</code></pre> <p>where</p> <ul> <li><code>[project code]</code> is the code for your project (e.g., x01);</li> <li><code>[group code]</code> is the code for your project group, if your project has      groups, (e.g. x01-a) or the same as the project code, if not;</li> <li><code>[username]</code> is your login name.</li> </ul> <p>Each project is allocated a portion of the total storage available, and the project PI will be able to sub-divide this quota among the groups and users within the project. As is standard practice on UNIX and Linux systems, the environment variable <code>$HOME</code> is automatically set to point to your home directory.</p>"},{"location":"tursa-user-guide/data/#tape-storage","title":"Tape storage","text":"<p>The tape storage can be made available to any Tursa user on request and  can be used to store data from the Lustre parallel file system.</p> <p>Managing and transferring data to/from the Tursa tape storage is done via the Miria web interface via an SSH tunnel to the Tursa login nodes.</p> <p>Important</p> <p>All data on the tape storage is shared project data rather than data associated with individual user accounts. Any data you move to tape will be visible to all users in the same project as you who have access to the tape storage.</p>"},{"location":"tursa-user-guide/data/#requesting-access-to-the-tape-storage","title":"Requesting access to the tape storage","text":"<p>If you want to use the Tursa tape storage, you should contact the DiRAC Service Desk with the username and project ID you want to use to access the storage.</p>"},{"location":"tursa-user-guide/data/#data-locations","title":"Data locations","text":"<p>In order to move data to the tape storage it must exist in a specific directory on the Tursa Lustre file system. You will need to move or copy the data to this location before it can be moved to tape and when you  restore data from tape it will be placed in this location.</p> <p>There is one directory per project on Tursa. The directory has the path:</p> <pre><code>/mnt/lustre/tursafs1/archive/[project code]\n</code></pre> <p>So, for example, the directory for project <code>dp001</code> would be:</p> <pre><code>/mnt/lustre/tursafs1/archive/dp001\n</code></pre>"},{"location":"tursa-user-guide/data/#setup-the-ssh-tunnel-for-miria","title":"Setup the SSH tunnel for Miria","text":"<p>Once your tape storage access has been setup and you have moved data to the archive directory, you will need to connect to the Miria web interface  in a web browser on your local system by setting up an SSH tunnel to the Tursa login nodes.</p> <p>You do this by logging into Tursa in the usual way (with your SSH key and password) and adding the <code>-L 9080:10.144.20.95:443</code> option to the <code>ssh</code> command.</p> <p>For example, if your username is <code>dc-user1</code>, you would setup the tunnel by logging into Tursa with (assuming your SSH key is in the default location):</p> <pre><code>ssh -L 9080:10.144.20.95:443 dc-user1@tursa.dirac.ed.ac.uk\n</code></pre> <p>Enter your SSH key passphrase and password in the usual way.</p> <p>Note</p> <p>You will need to setup the SSH tunnel each time you want to access the  Miria interface.</p>"},{"location":"tursa-user-guide/data/#access-the-miria-interface","title":"Access the Miria interface","text":"<p>Once you have setup the SSH tunnel, you should be able to access the Miria interface in a web browser on your local system. Open a new tab and enter the URL:</p> <ul> <li>http://localhost:9080/webapp-en/login</li> </ul> <p>You should see an interface asking you for a username and password. Use the  username and password that you use to log into Tursa to log into the tape  storage interface.</p>"},{"location":"tursa-user-guide/data/#transfer-data-from-tursa-lustre-to-tape","title":"Transfer data from Tursa Lustre to tape","text":"<p>You use the \"Easy Move\" option from the left-hand menu to transfer data.</p> <ol> <li>Click on \"Easy Move\"</li> <li>Click on the \"Find a source\" menu and select the disk with your project ID (e.g. \"dp001\")</li> <li>Click on the \"Find a target\" menu and select the archive with your project ID (e.g. \"dp001\")</li> <li>Use the file explorer to select the files/directories you wish to move to tape</li> <li>Click the \"Add\" button</li> <li>Scroll to the bottom of the page and select \"Validate basket\" and confirm you wish to proceed</li> </ol> <p>Your transfer request will be added to the queue. You can check on progress by selecting the \"Activity\" option in the left hand menu.</p>"},{"location":"tursa-user-guide/data/#restore-data-from-tape-to-tursa-lustre","title":"Restore data from tape to Tursa Lustre","text":"<p>You use the \"Easy Move\" option from the left-hand menu to transfer data.</p> <ol> <li>Click on \"Easy Move\"</li> <li>Select the \"Repository\" icon next to the \"Find a source\" menu</li> <li>Click on the \"Find a source\" menu and select the archive with your project ID (e.g. \"dp001\")</li> <li>Select the \"Platform\" icon next to the \"Find a source\" menu</li> <li>Click on the \"Find a target\" menu and select the disk with your project ID (e.g. \"dp001\")</li> <li>Use the source file explorer to select the files/directories you wish to restore </li> <li>Use the target file explorer to select the location oon disk to restore the data</li> <li>Click the \"Add\" button</li> <li>Scroll to the bottom of the page and select \"Validate basket\" and confirm you wish to proceed</li> </ol> <p>Your transfer request will be added to the queue. You can check on progress by selecting the \"Activity\" option in the left hand menu.</p> <p>Bug</p> <p>If you restore a file rather than a directory, the Miria tool will give the file the name <code>NULL</code> once it is restored, you should use the <code>mv</code> command to rename the file to the correct name once it has been restored.</p>"},{"location":"tursa-user-guide/data/#sharing-data-with-other-tursa-users","title":"Sharing data with other Tursa users","text":"<p>How you share data with other Tursa users depends on whether or not they belong to the same project as you. Each project has two shared folders that can be used for sharing data.</p>"},{"location":"tursa-user-guide/data/#sharing-data-with-tursa-users-in-your-project","title":"Sharing data with Tursa users in your project","text":"<p>Each project has an inner shared folder.</p> <pre><code>/home/[project code]/[project code]/shared\n</code></pre> <p>This folder has read/write permissions for all project members. You can place any data you wish to share with other project members in this directory. For example, if your project code is x01 the inner shared folder would be located at <code>/home/x01/x01/shared</code>.</p>"},{"location":"tursa-user-guide/data/#sharing-data-with-all-tursa-users","title":"Sharing data with all Tursa users","text":"<p>Each project also has an outer shared folder.:</p> <pre><code>/home/[project code]/shared\n</code></pre> <p>It is writable by all project members and readable by any user on the system. You can place any data you wish to share with other Tursa users who are not members of your project in this directory. For example, if your project code is x01 the outer shared folder would be located at <code>/home/x01/shared</code>.</p>"},{"location":"tursa-user-guide/data/#permissions","title":"Permissions","text":"<p>You should check the permissions of any files that you place in the shared area, especially if those files were created in your own Tursa account Files of the latter type are likely to be readable by you only.</p> <p>The <code>chmod</code> command below shows how to make sure that a file placed in the outer shared folder is also readable by all Tursa users.</p> <pre><code>chmod a+r /home/x01/shared/your-shared-file.txt\n</code></pre> <p>Similarly, for the inner shared folder, <code>chmod</code> can be called such that read permission is granted to all users within the x01 project.</p> <pre><code>chmod g+r /home/x01/x01/shared/your-shared-file.txt\n</code></pre> <p>If you're sharing a set of files stored within a folder hierarchy the <code>chmod</code> is slightly more complicated.</p> <pre><code>chmod -R a+Xr /home/x01/shared/my-shared-folder\nchmod -R g+Xr /home/x01/x01/shared/my-shared-folder\n</code></pre> <p>The <code>-R</code> option ensures that the read permission is enabled recursively and the <code>+X</code> guarantees that the user(s) you're sharing the folder with can  access the subdirectories below <code>my-shared-folder</code>.</p>"},{"location":"tursa-user-guide/data/#archiving-and-data-transfer","title":"Archiving and data transfer","text":"<p>Data transfer speed may be limited by many different factors so the best data transfer mechanism to use depends on the type of data being transferred and where the data is going.</p> <ul> <li>Disk speed - The Tursa  file system is highly parallel,      consisting of a very large number      of high performance disk drives. This allows it to support a      very high data bandwidth. Unless the remote system has a similar      parallel file-system you may find your transfer speed limited by      disk performance.</li> <li>Meta-data performance - Meta-data operations such as opening      and closing files or listing the owner or size of a file are much      less parallel than read/write operations. If your data consists of      a very large number of small files you may find your transfer      speed is limited by meta-data operations. Meta-data operations      performed by other users of the system will interact strongly with      those you perform so reducing the number of such operations you      use, may reduce variability in your IO timings.</li> <li>Network speed - Data transfer performance can be limited by      network speed. More importantly it is limited by the slowest      section of the network between source and destination.</li> <li>Firewall speed - Most modern networks are protected by some      form of firewall that filters out malicious traffic. This      filtering has some overhead and can result in a reduction in data      transfer performance. The needs of a general purpose network that      hosts email/web-servers and desktop machines are quite different      from a research network that needs to support high volume data      transfers. If you are trying to transfer data to or from a host on      a general purpose network you may find the firewall for that      network will limit the transfer rate you can achieve.</li> </ul> <p>The method you use to transfer data to/from Tursa will depend on how much you want to transfer and where to. The methods we cover in this guide are:</p> <ul> <li>scp/sftp/rsync - These are the simplest methods of      transferring data and can be used up to moderate amounts of data.      If you are transferring data to your workstation/laptop then this      is the method you will use.</li> </ul> <p>Before discussing specific data transfer methods, we cover archiving which is an essential process for transferring data efficiently.</p>"},{"location":"tursa-user-guide/data/#archiving","title":"Archiving","text":"<p>If you have related data that consists of a large number of small files it is strongly recommended to pack the files into a larger \"archive\" file for ease of transfer and manipulation. A single large file makes more efficient use of the file system and is easier to move and copy and transfer because significantly fewer meta-data operations are required. Archive files can be created using tools like <code>tar</code> and <code>zip</code>.</p>"},{"location":"tursa-user-guide/data/#tar","title":"tar","text":"<p>The <code>tar</code> command packs files into a \"tape archive\" format. The command has general form:</p> <pre><code>tar [options] [file(s)]\n</code></pre> <p>Common options include:</p> <ul> <li><code>-c</code> create a new archive</li> <li><code>-v</code> verbosely list files processed</li> <li><code>-W</code> verify the archive after writing</li> <li><code>-l</code> confirm all file hard links are included in the archive</li> <li><code>-f</code> use an archive file (for historical reasons, tar writes its      output to stdout by default rather than a file).</li> </ul> <p>Putting these together:</p> <pre><code>tar -cvWlf mydata.tar mydata\n</code></pre> <p>will create and verify an archive.</p> <p>To extract files from a tar file, the option <code>-x</code> is used. For example:</p> <pre><code>tar -xf mydata.tar\n</code></pre> <p>will recover the contents of <code>mydata.tar</code> to the current working directory.</p> <p>To verify an existing tar file against a set of data, the <code>-d</code> (diff) option can be used. By default, no output will be given if a verification succeeds and an example of a failed verification follows:</p> <pre><code>$&gt; tar -df mydata.tar mydata/*\nmydata/damaged_file: Mod time differs\nmydata/damaged_file: Size differs\n</code></pre> <p>Note</p> <p>tar files do not store checksums with their data, requiring the original data to be present during verification.</p> <p>Tip</p> <p>Further information on using <code>tar</code> can be found in the <code>tar</code> manual (accessed via <code>man tar</code> or at man tar).</p>"},{"location":"tursa-user-guide/data/#zip","title":"zip","text":"<p>The zip file format is widely used for archiving files and is supported by most major operating systems. The utility to create zip files can be run from the command line as:</p> <pre><code>zip [options] mydata.zip [file(s)]\n</code></pre> <p>Common options are:</p> <ul> <li><code>-r</code> used to zip up a directory</li> <li><code>-#</code> where \"#\" represents a digit ranging from 0 to 9 to specify      compression level, 0 being the least and 9 the most. Default      compression is -6 but we recommend using -0 to speed up the      archiving process.</li> </ul> <p>Together:</p> <pre><code>zip -0r mydata.zip mydata\n</code></pre> <p>will create an archive.</p> <p>Note</p> <p>Unlike tar, zip files do not preserve hard links. File data will be copied on archive creation, e.g. an uncompressed zip archive of a 100MB file and a hard link to that file will be approximately 200MB in size. This makes zip an unsuitable format if you wish to precisely reproduce the file system layout.</p> <p>The corresponding <code>unzip</code> command is used to extract data from the archive. The simplest use case is:</p> <pre><code>unzip mydata.zip\n</code></pre> <p>which recovers the contents of the archive to the current working directory.</p> <p>Files in a zip archive are stored with a CRC checksum to help detect data loss. <code>unzip</code> provides options for verifying this checksum against the stored files. The relevant flag is <code>-t</code> and is used as follows:</p> <pre><code>$&gt; unzip -t mydata.zip\nArchive:  mydata.zip\n    testing: mydata/                 OK\n    testing: mydata/file             OK\nNo errors detected in compressed data of mydata.zip.\n</code></pre> <p>Tip</p> <p>Further information on using <code>zip</code> can be found in the <code>zip</code> manual (accessed via <code>man zip</code> or at man zip).</p>"},{"location":"tursa-user-guide/data/#data-transfer-via-ssh","title":"Data transfer via SSH","text":"<p>The easiest way of transferring data to/from Tursa is to use one of the standard programs based on the SSH protocol such as <code>scp</code>, <code>sftp</code> or <code>rsync</code>. These all use the same underlying mechanism (SSH) as you normally use to log-in to Tursa. So, once the the command has been executed via the command line, you will be prompted for your password for the specified account on the remote machine (Tursa in this case).</p> <p>To avoid having to type in your password multiple times you can set up a SSH key pair and use an SSH agent as documented in the User Guide at <code>connecting</code>.</p>"},{"location":"tursa-user-guide/data/#ssh-data-transfer-performance-considerations","title":"SSH data transfer performance considerations","text":"<p>The SSH protocol encrypts all traffic it sends. This means that file transfer using SSH consumes a relatively large amount of CPU time at both ends of the transfer (for encryption and decryption). The Tursa login nodes have fairly fast processors that can sustain about 100 MB/s transfer. The encryption algorithm used is negotiated between the SSH client and the SSH server. There are command line flags that allow you to specify a preference for which encryption algorithm should be used. You may be able to improve transfer speeds by requesting a different algorithm than the default. The <code>aes128-ctr</code> or <code>aes256-ctr</code> algorithms are well supported and fast as they are implemented in hardware. These are not usually the default choice when using <code>scp</code> so you will need to manually specify them.</p> <p>A single SSH based transfer will usually not be able to saturate the available network bandwidth or the available disk bandwidth so you may see an overall improvement by running several data transfer operations in parallel. To reduce metadata interactions it is a good idea to overlap transfers of files from different directories.</p> <p>In addition, you should consider the following when transferring data:</p> <ul> <li>Only transfer those files that are required. Consider which data      you really need to keep.</li> <li>Combine lots of small files into a single tar archive, to reduce      the overheads associated in initiating many separate data      transfers (over SSH, each file counts as an individual transfer).</li> <li>Compress data before transferring it, e.g. using <code>gzip</code>.</li> </ul>"},{"location":"tursa-user-guide/data/#scp","title":"scp","text":"<p>The <code>scp</code> command creates a copy of a file, or if given the <code>-r</code> flag, a directory either from a local machine onto a remote machine or from a remote machine onto a local machine.</p> <p>For example, to transfer files to Tursa from a local machine:</p> <pre><code>scp [options] source user@tursa.dirac.ed.ac.uk:[destination]\n</code></pre> <p>(Remember to replace <code>user</code> with your Tursa username in the example above.)</p> <p>In the above example, the <code>[destination]</code> is optional, as when left out <code>scp</code> will copy the source into your home directory. Also, the <code>source</code> should be the absolute path of the file/directory being copied or the command should be executed in the directory containing the source file/directory.</p> <p>If you want to request a different encryption algorithm add the <code>-c [algorithm-name]</code> flag to the <code>scp</code> options. For example, to use the (usually faster) arcfour encryption algorithm you would     use:</p> <pre><code>scp [options] -c aes128-ctr source user@tursa.dirac.ed.ac.uk:[destination]\n</code></pre> <p>(Remember to replace <code>user</code> with your Tursa username in the example above.)</p>"},{"location":"tursa-user-guide/data/#rsync","title":"rsync","text":"<p>The <code>rsync</code> command can also transfer data between hosts using a <code>ssh</code> connection. It creates a copy of a file or, if given the <code>-r</code> flag, a directory at the given destination, similar to <code>scp</code> above.</p> <p>Given the <code>-a</code> option rsync can also make exact copies (including permissions), this is referred to as mirroring. In this case the <code>rsync</code> command is executed with <code>ssh</code> to create the copy on a remote machine.</p> <p>To transfer files to Tursa using <code>rsync</code> with <code>ssh</code> the command has the form:</p> <pre><code>rsync [options] -e ssh source user@tursa.dirac.ed.ac.uk:[destination]\n</code></pre> <p>(Remember to replace <code>user</code> with your Tursa username in the example above.)</p> <p>In the above example, the <code>[destination]</code> is optional, as when left out rsync will copy the source into your home directory. Also the <code>source</code> should be the absolute path of the file/directory being copied or the command should be executed in the directory containing the source file/directory.</p> <p>Additional flags can be specified for the underlying <code>ssh</code> command by using a quoted string as the argument of the <code>-e</code> flag.     e.g.</p> <pre><code>rsync [options] -e \"ssh -c arcfour\" source user@tursa.dirac.ed.ac.uk:[destination]\n</code></pre> <p>(Remember to replace <code>user</code> with your Tursa username in the example above.)</p> <p>Tip</p> <p>Further information on using <code>rsync</code> can be found in the <code>rsync</code> manual (accessed via <code>man rsync</code> or at man rsync).</p>"},{"location":"tursa-user-guide/data/#ssh-data-transfer-example-laptopworkstation-to-tursa","title":"SSH data transfer example: laptop/workstation to Tursa","text":"<p>Here we have a short example demonstrating transfer of data directly from a laptop/workstation to Tursa.</p> <p>Note</p> <p>This guide assumes you are using a command line interface to transfer data. This means the terminal on Linux or macOS, MobaXterm local terminal on Windows or Powershell.</p> <p>Before we can transfer of data to Tursa we need to make sure we have an SSH key setup to access Tursa from the system we are transferring data from. If you are using the same system that you use to log into Tursa then you should be all set. If you want to use a different system you will need to generate a new SSH key there (or use SSH key forwarding) to allow you to  connect to Tursa.</p> <p>Tip</p> <p>Remember that you will need to use both a key and your password to transfer data to Tursa.</p> <p>Once we know our keys are setup correctly, we are now ready to transfer  data directly between the two machines. We begin by combining our important  research data in to a single archive file using the following command:</p> <pre><code>tar -czf all_my_files.tar.gz file1.txt file2.txt file3.txt\n</code></pre> <p>We then initiate the data transfer from our system to Tursa, here using <code>rsync</code> to allow the transfer to be recommenced without needing to start again, in the event of a loss of connection or other failure. For example,  using the SSH key in the file <code>~/.ssh/id_RSA_A2</code> on our local system:</p> <pre><code>rsync -Pv -e\"ssh -c aes128-gcm@openssh.com -i $HOME/.ssh/id_RSA_A2\" ./all_my_files.tar.gz otbz19@tursa.dirac.ed.ac.uk:/home/z19/z19/otbz19/\n</code></pre> <p>Note the use of the <code>-P</code> flag to allow partial transfer -- the same command could be used to restart the transfer after a loss of connection. The <code>-e</code> flag allows specification of the ssh command - we have used this to add the location of the identity file.  The <code>-c</code> option specifies the cipher to be used as <code>aes128-gcm</code> which has  been found to increase performance. Unfortunately the <code>~</code> shortcut is not correctly expanded, so we have specified the full path. We move our  research archive to our project work directory on Tursa.</p> <p>Note</p> <p>Remember to replace <code>otbz19</code> with your username on Tursa.</p> <p>If we were unconcerned about being able to restart an interrupted transfer, we could instead use the <code>scp</code> command,</p> <pre><code>scp -c aes128-gcm@openssh.com -i ~/.ssh/id_RSA_A2 all_my_files.tar.gz otbz19@transfer.dyn.tursa.ac.uk:/home/z19/z19/otbz19/\n</code></pre> <p>but <code>rsync</code> is recommended for larger transfers.</p>"},{"location":"tursa-user-guide/hardware/","title":"ARCHER2 hardware","text":"<p>Note</p> <p>Some of the material in this section is closely based on information provided by NASA as part of the documentation for the Aitkin HPC system.</p>"},{"location":"tursa-user-guide/hardware/#system-overview","title":"System overview","text":"<p>Tursa is a Eviden supercomputing system which has a total of 178 GPU compute nodes. Each GPU compute node has a CPU with 48 cores and 4 NVIDIA A100 GPU. Compute nodes are connected together by an Infiniband interconnect. </p> <p>There are additional login nodes, which provide access to the system.</p> <p>Compute nodes are only accessible via the Slurm job scheduling system.</p> <p>There is a single file system which is available on login and compute nodes (see Data management and transfer).</p> <p>The Lustre file system has a capacity of 5.1 PiB.</p> <p>The interconnect uses a Fat Tree topology.</p>"},{"location":"tursa-user-guide/hardware/#interconnect-details","title":"Interconnect details","text":"<p>Tursa has a high performance interconnect with 4x 200 Gb/s infiniband interfaces per node. It uses a 2-layer fat tree topology:</p> <ul> <li>Each node connects to 4 of the 5 L1 (leaf) switches within the same cabinet with 200 Gb/s links</li> <li>Within an 8-node block, all nodes share the same 4 switches</li> <li>Each L1 switch connects to all 20 L2 switches via 200 Gb/s links - leading maximum of 2 switch to switch hops to get between any 2 nodes</li> <li>There are no direct L1 to L1 or L2 to L2 switch connections</li> <li>16-node, 32-node and 64-node blocks are constructed from 8-node blocks that show the required performance on the inter-block links</li> </ul>"},{"location":"tursa-user-guide/scheduler/","title":"Running jobs on Tursa","text":"<p>As with most HPC services, Tursa uses a scheduler to manage access to resources and ensure that the thousands of different users of system are able to share the system and all get access to the resources they require. Tursa uses the Slurm software to schedule jobs.</p> <p>Writing a submission script is typically the most convenient way to submit your job to the scheduler. Example submission scripts (with explanations) for the most common job types are provided below.</p> <p>Interactive jobs are also available and can be particularly useful for developing and debugging applications. More details are available below.</p> <p>Hint</p> <p>If you have any questions on how to run jobs on Tursa do not hesitate to contact the DiRAC Service Desk.</p> <p>You typically interact with Slurm by issuing Slurm commands from the login nodes (to submit, check and cancel jobs), and by specifying Slurm directives that describe the resources required for your jobs in job submission scripts.</p>"},{"location":"tursa-user-guide/scheduler/#resources","title":"Resources","text":""},{"location":"tursa-user-guide/scheduler/#gpuh","title":"GPUh","text":"<p>Time used on Tursa nodes is measured in GPUh. 1 GPUh = 1 GPU for 1 hour. So a Tursa compute node with 4 GPUs would cost 4 GPUh per hour.</p> <p>Note</p> <p>The minimum resource request on Tursa is one full node which is charged  at a rate of 4 GPUh per hour.</p>"},{"location":"tursa-user-guide/scheduler/#checking-available-budget","title":"Checking available budget","text":"<p>You can check in SAFE by selecting <code>Login accounts</code> from the menu, select the login account you want to query.</p> <p>Under <code>Login account details</code> you will see each of the budget codes you have access to listed e.g. <code>dp123 resources</code> and then under Resource Pool to the right of this, a note of the remaining budgets. </p> <p>When logged in to the machine you can also use the command </p> <pre><code>sacctmgr show assoc where user=$LOGNAME format=account,user,maxtresmins%75\n</code></pre> <p>This will list all the budget codes that you have access to e.g.</p> <pre><code>Account       User                                                                 MaxTRESMins \n---------- ---------- --------------------------------------------------------------------------- \n       t01   dc-user1                           gres/cpu-low=0,gres/cpu-standard=0,gres/gpu-low=0 \n       z01   dc-user1   \n</code></pre> <p>This shows that <code>dc-user1</code> is a member of budgets <code>t01</code> and <code>z01</code>.  However, the <code>gres/cpu-low=0,gres/cpu-standard=0,gres/gpu-low=0</code> indicates that the <code>t01</code> budget can only run GPU jobs in standard (charged) partitions (all other options are disabled, indicated by <code>=0</code> for CPU standard, CPU low and GPU low).  This user can also submit jobs to any partition using the <code>z01</code> budget.</p> <p>To see the number of coreh or GPUh remaining you must check in SAFE.</p>"},{"location":"tursa-user-guide/scheduler/#charging","title":"Charging","text":"<p>Jobs run on Tursa are charged for the time they use i.e. from the time the job begins to run until the time the job ends (not the full wall time requested).</p> <p>Jobs are charged for the full number of nodes which are requested, even if they are not all used.</p> <p>Charging takes place at the time the job ends, and the job is charged in full to the budget which is live at the end time.</p>"},{"location":"tursa-user-guide/scheduler/#basic-slurm-commands","title":"Basic Slurm commands","text":"<p>There are three key commands used to interact with the Slurm on the command line:</p> <ul> <li><code>sinfo</code> - Get information on the partitions and resources available</li> <li><code>sbatch jobscript.slurm</code> - Submit a job submission script (in this     case called: <code>jobscript.slurm</code>) to the scheduler</li> <li><code>squeue</code> - Get the current status of jobs submitted to the scheduler</li> <li><code>scancel 12345</code> - Cancel a job (in this case with the job ID     <code>12345</code>)</li> </ul> <p>We cover each of these commands in more detail below.</p>"},{"location":"tursa-user-guide/scheduler/#sinfo-information-on-resources","title":"<code>sinfo</code>: information on resources","text":"<p><code>sinfo</code> is used to query information about available resources and partitions. Without any options, <code>sinfo</code> lists the status of all resources and partitions, e.g.</p> <pre><code>[dc-user1@tursa-login1 ~]$ sinfo \n\nPARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST\ncpu          up 2-00:00:00      4  alloc tu-c0r0n[66-69]\ncpu          up 2-00:00:00      2   idle tu-c0r0n[70-71]\ngpu          up 2-00:00:00      1   plnd tu-c0r2n93\ngpu          up 2-00:00:00     11  drain tu-c0r0n75,tu-c0r5n[48,51,54,57],tu-c0r6n[48,51,54,57],tu-c0r7n[00,48]\ngpu          up 2-00:00:00    112    mix tu-c0r0n[00,03,06,09,12,15,18,21,24,27,30,33,36,39,42,45,72,87,90],tu-c0r1n[00,03,06,09,12,15,18,21,24,27,30,33,60,63,66,69,72,75,78,81,84,87,90,93],tu-c0r2n[00,03,06,09,12,15,18,21,24,27,30,33,60,63,66,69,72,75,78,81,84,87,90],tu-c0r3n[00,03,06,09,12,15,18,21,24,27,30,33,60,63,66,69,72,75,78,81,84,90,93],tu-c0r4n[00,03,06,09,12,15,18,21,24,27,30,33,60,63,66,69,72,75,81,84,87,90,93]\ngpu          up 2-00:00:00     56   resv tu-c0r0n93,tu-c0r4n78,tu-c0r5n[00,03,06,09,12,15,18,21,24,27,30,33,36,39,42,45],tu-c0r6n[00,03,06,09,12,15,18,21,24,27,30,33,36,39,42,45,60,63,66,69],tu-c0r7n[03,06,09,12,15,18,21,24,27,30,33,36,39,42,45,51,54,57]\ngpu          up 2-00:00:00      1   idle tu-c0r3n87\n</code></pre> <ul> <li><code>alloc</code> nodes are those that are running jobs</li> <li><code>idle</code> nodes are empty</li> <li><code>drain</code>, <code>down</code>, <code>maint</code> nodes are unavailable to users</li> <li><code>plnd</code> nodes are reserved for future jobs</li> </ul>"},{"location":"tursa-user-guide/scheduler/#sbatch-submitting-jobs","title":"<code>sbatch</code>: submitting jobs","text":"<p><code>sbatch</code> is used to submit a job script to the job submission system. The script will typically contain one or more <code>mpirun</code> commands to launch parallel tasks.</p> <p>When you submit the job, the scheduler provides the job ID, which is used to identify this job in other Slurm commands and when looking at resource usage in SAFE.</p> <pre><code>sbatch test-job.slurm\nSubmitted batch job 12345\n</code></pre>"},{"location":"tursa-user-guide/scheduler/#squeue-monitoring-jobs","title":"<code>squeue</code>: monitoring jobs","text":"<p><code>squeue</code> without any options or arguments shows the current status of all jobs known to the scheduler. For example:</p> <pre><code>squeue\n</code></pre> <p>will list all jobs on Tursa.</p> <p>The output of this is often large. You can restrict the output to just your jobs by adding the <code>--me</code> option:</p> <pre><code>squeue --me\n</code></pre>"},{"location":"tursa-user-guide/scheduler/#scancel-deleting-jobs","title":"<code>scancel</code>: deleting jobs","text":"<p><code>scancel</code> is used to delete a jobs from the scheduler. If the job is waiting to run it is simply cancelled, if it is a running job then it is stopped immediately. You need to provide the job ID of the job you wish to cancel/stop. For example:</p> <pre><code>scancel 12345\n</code></pre> <p>will cancel (if waiting) or stop (if running) the job with ID <code>12345</code>.</p>"},{"location":"tursa-user-guide/scheduler/#resource-limits","title":"Resource Limits","text":"<p>The Tursa resource limits for any given job are covered by three separate attributes.</p> <ul> <li>The amount of primary resource you require, i.e., number of     compute nodes.</li> <li>The partition that you want to use - this specifies the nodes that     are eligible to run your job.</li> <li>The Quality of Service (QoS) that you want to use - this specifies     the job limits that apply.</li> </ul>"},{"location":"tursa-user-guide/scheduler/#primary-resource","title":"Primary resource","text":"<p>The primary resource you can request for your job is the compute node.</p> <p>Information</p> <p>The <code>--exclusive</code> option is enforced on Tursa which means you will always have access to all of the memory on the compute node regardless of how many processes are actually running on the node.</p> <p>Note</p> <p>You will not generally have access to the full amount of memory resource on the the node as some is retained for running the operating system and other system processes.</p>"},{"location":"tursa-user-guide/scheduler/#partitions","title":"Partitions","text":"<p>On Tursa, compute nodes are grouped into partitions. You will have to specify a partition using the <code>--partition</code> option in your Slurm submission script. The following table has a list of active partitions on Tursa.</p> Partition Description Max nodes available cpu CPU nodes with AMD EPYC 48-core processor \u00d7 2 6 gpu GPU nodes with AMD EPYC 48-core processor and NVIDIA A100 GPU \u00d7 4 (this includes both A100-40 and A100-80 GPU) 181 gpu-a100-40 GPU nodes with 2 AMD EPYC 16-core processors and NVIDIA A100-40 GPU \u00d7 4 114 gpu-a100-80 GPU nodes with 2 AMD EPYC 24-core processor (3 nodes have 2 AMD EPYC 16-core processors) and NVIDIA A100-80 GPU \u00d7 4 67 <p>You can list the active partitions by running <code>sinfo</code>.</p> <p>Tip</p> <p>You may not have access to all the available partitions.</p>"},{"location":"tursa-user-guide/scheduler/#quality-of-service-qos","title":"Quality of Service (QoS)","text":"<p>On Tursa, job limits are defined by the requested Quality of Service (QoS), as specified by the <code>--qos</code> Slurm directive. The following table lists the active QoS on Tursa.</p> QoS Max Nodes Per Job Max Walltime Jobs Queued Jobs Running Partition(s) Notes standard 64 48 hrs 32 16 gpu, gpu-a100-40, gpu-a100-80, cpu Only jobs sizes that are powers of 2 nodes are allowed (i.e. 1, 2, 4, 8, 16, 32, 64 nodes), only available when your budget is positive. low 64 24 hrs 4 4 gpu, gpu-a100-40, gpu-a100-40, cpu Only jobs sizes that are powers of 2 nodes are allowed (i.e. 1, 2, 4, 8, 16, 32, 64 nodes), only available when your budget is zero or negative dev 2 4 hrs 2 1 gpu For faster turnaround for development jobs and interactive sessions, only available when your budget is positive. The dev QoS must be used with the <code>gpu-a100-40</code> (1-node maximum) or <code>gpu-a100-80</code> (2-node maximum) partitions. <p>You can find out the QoS that you can use by running the following command:</p> <pre><code>sacctmgr show assoc user=$USER cluster=tursa format=cluster,account,user,qos%50\n</code></pre> <p>As long as you have a positive budget, you should use the <code>standard</code> QoS. Once you have exhausted your budget you can use the <code>low</code> QoS to continue to run jobs at a lower priority than jobs in the  <code>standard</code> QoS.</p> <p>Hint</p> <p>If you have needs which do not fit within the current QoS, please contact the Service Desk and we can discuss how to accommodate your requirements.</p> <p>Important</p> <p>Only jobs sizes that are powers of 2 nodes are allowed. i.e. 1, 2, 4, 8, 16, 32, 64 nodes on the <code>gpu</code> partition and  1, 2, 4 nodes on the <code>cpu</code> partition.</p>"},{"location":"tursa-user-guide/scheduler/#priority","title":"Priority","text":"<p>Job priority on Tursa depends on a number of different factors:</p> <ul> <li>The QoS your job has specified</li> <li>The amount of time you have been queuing for</li> <li>Your current fairshare factor</li> </ul> <p>Each of these factors is normalised to a value between 0 and 1, is multiplied with a weight and the resulting values combined to produce a priority for the job.  The current job priority formula on Tursa is:</p> <pre><code>Priority = [10000 * P(QoS)] + [500 * P(Age)] + [300 * P(Fairshare)]\n</code></pre> <p>The priority factors are:</p> <ul> <li>P(QoS) - The QoS priority normalised to a value between 0 and 1. The maximum raw   value is 10000 and the minimum is 0. <code>standard</code> QoS has a value of 5000 and <code>low</code>   QoS a value of 1.</li> <li>P(Age) - The priority based on the job age normalised to a value between 0 and 1.   The maximum raw value is 14 days (where P(Age) = 1).</li> <li>P(Fairshare) - The fairshare priority normalised to a value between 0 and 1. Your   fairshare priority is determined by a combination of your budget code fairshare    value and your user fairshare value within that budget code. The more use that    the budget code you are using has made of the system recently relative to other    budget codes on the system, the lower the budget code fairshare value will be; and the more   use you have made of the system recently relative to other users within your   budget code, the lower your user fairshare value will be. The decay half life    for fairshare on Tursa is set to 14 days. More information on the Slurm fairshare   algorithm.</li> </ul> <p>You can view the priorities for current queued jobs on the system with the <code>sprio</code> command:</p> <pre><code>[dc-user1@tursa-login1 ~]$ sprio \n          JOBID PARTITION   PRIORITY       SITE        AGE  FAIRSHARE        QOS\n          43963 gpu             5055          0         51          5       5000\n          43975 gpu             5061          0         41         20       5000\n          43976 gpu             5061          0         41         20       5000\n          43982 gpu             5046          0         26         20       5000\n          43986 gpu             5011          0          6          5       5000\n          43996 gpu             5020          0          0         20       5000\n          43997 gpu             5020          0          0         20       5000\n</code></pre>"},{"location":"tursa-user-guide/scheduler/#troubleshooting","title":"Troubleshooting","text":""},{"location":"tursa-user-guide/scheduler/#slurm-error-messages","title":"Slurm error messages","text":"<p>An incorrect submission will cause Slurm to return an error. Some common problems are listed below, with a suggestion about the likely cause:</p> <ul> <li> <p><code>sbatch: unrecognized option &lt;text&gt;</code></p> <p>One of your options is invalid or has a typo. <code>man sbatch</code> to help.</p> </li> <li> <p><code>error: Batch job submission failed: No partition specified or system default partition</code></p> <p>A <code>--partition=</code> option is missing. You must specify the partition (see the list above). This is most often <code>--partition=standard</code>.</p> </li> <li> <p><code>error: invalid partition specified: &lt;partition&gt;</code></p> <p><code>error: Batch job submission failed: Invalid partition name specified</code></p> <p>Check the partition exists and check the spelling is correct.</p> </li> <li> <p><code>error: Batch job submission failed: Invalid account or account/partition combination specified</code></p> <p>This probably means an invalid account has been given. Check the <code>--account=</code> options against valid accounts in SAFE.</p> </li> <li> <p><code>error: Batch job submission failed: Invalid qos specification</code></p> <p>A QoS option is either missing or invalid. Check the script has a <code>--qos=</code> option and that the option is a valid one from the table above. (Check the spelling of the QoS is correct.)</p> </li> <li> <p><code>error: Your job has no time specification (--time=)...</code></p> <p>Add an option of the form <code>--time=hours:minutes:seconds</code> to the submission script. E.g., <code>--time=01:30:00</code> gives a time limit of 90 minutes.</p> </li> <li> <p><code>error: QOSMaxWallDurationPerJobLimit</code> <code>error: Batch job submission failed: Job violates accounting/QOS policy</code> <code>(job submit limit, user's size and/or time limits)</code></p> <p>The script has probably specified a time limit which is too long for the corresponding QoS. E.g., the time limit for the short QoS is 20 minutes.</p> </li> </ul>"},{"location":"tursa-user-guide/scheduler/#slurm-queued-reasons","title":"Slurm queued reasons","text":"<p>The <code>squeue</code> command allows users to view information for jobs managed by Slurm. Jobs typically go through the following states: PENDING, RUNNING, COMPLETING, and COMPLETED. The first table provides a description of some job state codes. The second table provides a description of the reasons that cause a job to be in a state.</p> Status Code Description PENDING PD Job is awaiting resource allocation. RUNNING R Job currently has an allocation. SUSPENDED S Job currently has an allocation. COMPLETING CG Job is in the process of completing. Some processes on some nodes may still be active. COMPLETED CD Job has terminated all processes on all nodes with an exit code of zero. TIMEOUT TO Job terminated upon reaching its time limit. STOPPED ST Job has an allocation, but execution has been stopped with SIGSTOP signal. CPUS have been retained by this job. OUT_OF_MEMORY OOM Job experienced out of memory error. FAILED F Job terminated with non-zero exit code or other failure condition. NODE_FAIL NF Job terminated due to failure of one or more allocated nodes. CANCELLED CA Job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated. <p>For a full list of see Job State Codes.</p> Reason Description Priority One or more higher priority jobs exist for this partition or advanced reservation. Resources The job is waiting for resources to become available. BadConstraints The job's constraints can not be satisfied. BeginTime The job's earliest start time has not yet been reached. Dependency This job is waiting for a dependent job to complete. Licenses The job is waiting for a license. WaitingForScheduling No reason has been set for this job yet. Waiting for the scheduler to determine the appropriate reason. Prolog Its PrologSlurmctld program is still running. JobHeldAdmin The job is held by a system administrator. JobHeldUser The job is held by the user. JobLaunchFailure The job could not be launched. This may be due to a file system problem, invalid program name, etc. NonZeroExitCode The job terminated with a non-zero exit code. InvalidAccount The job's account is invalid. InvalidQOS The job's QOS is invalid. QOSUsageThreshold Required QOS threshold has been breached. QOSJobLimit The job's QOS has reached its maximum job count. QOSResourceLimit The job's QOS has reached some resource limit. QOSTimeLimit The job's QOS has reached its time limit. NodeDown A node required by the job is down. TimeLimit The job exhausted its time limit. ReqNodeNotAvail Some node specifically required by the job is not currently available. The node may currently be in use, reserved for another job, in an advanced reservation, DOWN, DRAINED, or not responding. Nodes which are DOWN, DRAINED, or not responding will be identified as part of the job's \"reason\" field as \"UnavailableNodes\". Such nodes will typically require the intervention of a system administrator to make available. <p>For a full list of see Job Reasons.</p>"},{"location":"tursa-user-guide/scheduler/#output-from-slurm-jobs","title":"Output from Slurm jobs","text":"<p>Slurm places standard output (STDOUT) and standard error (STDERR) for each job in the file <code>slurm_&lt;JobID&gt;.out</code>. This file appears in the job's working directory once your job starts running.</p> <p>Hint</p> <p>Output may be buffered - to enable live output, e.g. for monitoring job status, add <code>--unbuffered</code> to the <code>srun</code> command in your SLURM script.</p>"},{"location":"tursa-user-guide/scheduler/#specifying-resources-in-job-scripts","title":"Specifying resources in job scripts","text":"<p>You specify the resources you require for your job using directives at the top of your job submission script using lines that start with the directive <code>#SBATCH</code>.</p> <p>Hint</p> <p>Most options provided using <code>#SBATCH</code> directives can also be specified as command line options to <code>srun</code>.</p> <p>If you do not specify any options, then the default for each option will be applied. As a minimum, all job submissions must specify the budget that they wish to charge the job too with the option:</p> <ul> <li><code>--account=&lt;budgetID&gt;</code> your budget ID is usually something like      <code>t01</code> or <code>t01-test</code>. You can see which budget codes you can charge      to in SAFE.</li> </ul> <p>Other common options that are used are:</p> <ul> <li><code>--time=&lt;hh:mm:ss&gt;</code> the maximum walltime for your job. e.g. For      a 6.5 hour walltime, you would use <code>--time=6:30:0</code>.</li> <li><code>--job-name=&lt;jobname&gt;</code> set a name for the job to help identify it      in</li> </ul> <p>In addition, parallel jobs will also need to specify how many nodes, parallel processes and threads they require.</p> <ul> <li><code>--nodes=&lt;nodes&gt;</code> the number of nodes to use for the job.</li> <li><code>--tasks-per-node=&lt;processes per node&gt;</code> the number of parallel      processes (e.g. MPI ranks) per node. For Grid on GPU nodes this will      typically be 4 to give 1 MPI process per GPU. The CPU nodes have 128      cores per node.</li> <li><code>--cpus-per-task=&lt;stride between processes&gt;</code> for Grid jobs on GPU nodes      where you typically use 1 MPI process per GPU, 4 per node, this will      usually be 12 (so that the 48 cores on a node are evenly divided between      the 4 MPI processes)</li> <li><code>--gres=gpu:4</code> the number of GPU to use per node. This will almost always      be 4 to use all GPUs on a node. (This option should not be specified for      jobs on the CPU nodes.)</li> </ul> <p>If you are happy to have any GPU type for your job (A100-40 or A100-80) then you select the <code>gpu</code> partition:</p> <ul> <li><code>--partition=gpu</code></li> </ul> <p>If you wish to use just the A100-80 GPU nodes which have higher memory, you add the following option:</p> <ul> <li><code>--partition=gpu-a100-80</code> request the job is placed on nodes with high-memory    (80 GB) GPUs - there are 64 high memory GPU nodes on the system. </li> </ul> <p>To just use the A100-40 GPU nodes:</p> <ul> <li><code>--partition=gpu-a100-40</code> request the job is placed on nodes with standard memory    (40 GB) GPUs.</li> </ul> <p>If you do not specfy a partition, the scheduler may use any available node types for  the job (equivalent of <code>--partition=gpu</code>).</p> <p>Note</p> <p>For parallel jobs, Tursa operates in a node exclusive way. This means that you are assigned resources in the units of full compute nodes for your jobs (i.e. 32 cores and 4 GPU on GPU A100-40 nodes, 48 cores and 4 GPU on A100-80 nodes, 128 cores on CPU nodes) and that no other user can share those compute nodes with you. Hence, the minimum amount of resource you can request for a parallel job is 1 node (or 32 cores and 4 GPU on GPU A100-40 nodes, 48 cores and 4 GPU on A100-80 nodes, 128 cores on CPU nodes).</p> <p>To prevent the behaviour of batch scripts being dependent on the user environment at the point of submission, the option</p> <ul> <li><code>--export=none</code> prevents the user environment from being exported      to the batch system.</li> </ul> <p>Using the <code>--export=none</code> means that the behaviour of batch submissions should be repeatable. We strongly recommend its use, although see the following section to enable access to the usual modules.</p>"},{"location":"tursa-user-guide/scheduler/#gpu-frequency","title":"GPU frequency","text":"<p>Important</p> <p>The default GPU frequency on Tursa compute nodes was changed from 1410 MHz to 1040 MHz on Thursday 15 Dec 2022 to improve the energy efficiency of the service.</p> <p>Users can control the GPU frequency in their job submission scripts:</p> <ul> <li><code>--gpu-freq=&lt;desired GPU freq in MHz&gt;</code> allows users to set the GPU frequency       on a per job basis. The frequency can be set in the range 210 - 1410 MHz in steps      of 15 MHz.</li> </ul> <p>Bug</p> <p>When setting the GPU frequency you will see an error in the output from the job  that says <code>control disabled</code>. This is an incorrect message due to an issue with  how Slurm sets the GPU frequency and can be safely ignored.</p>"},{"location":"tursa-user-guide/scheduler/#srun-launching-parallel-jobs","title":"<code>srun</code>: Launching parallel jobs","text":"<p>If you are running parallel jobs, your job submission script should contain one or more srun commands to launch the parallel executable across the compute nodes. In most cases you will want to add the options --distribution=block:block and --hint=nomultithread to your srun command to ensure you get the correct pinning of processes to cores on a compute node.</p> <p>A brief explanation of these options:  - <code>--hint=nomultithread</code> - do not use hyperthreads/SMP  - <code>--distribution=block:block</code> - the first <code>block</code> means use a block distribution    of processes across nodes (i.e. fill nodes before moving onto the next one) and    the second <code>block</code> means use a block distribution of processes across \"sockets\"    within a node (i.e. fill a \"socket\" before moving on to the next one).</p> <p>Important</p> <p>The Slurm definition of a \"socket\" does not usually correspond to a physical CPU socket. On Tursa GPU nodes it corresponds to half the cores on a socket as the GPU nodes are configured with NPS2.</p> <p>On the Tursa CPU nodes, the Slurm definition of a scoket does correspond to a physical CPU socket (64 cores) as the COU nodes are configured with NPS1.</p>"},{"location":"tursa-user-guide/scheduler/#example-job-submission-scripts","title":"Example job submission scripts","text":""},{"location":"tursa-user-guide/scheduler/#example-job-submission-script-for-a-parallel-job-using-cuda","title":"Example: job submission script for a parallel job using CUDA","text":"<p>A job submission script for a parallel job that uses 4 compute nodes, 4 MPI processes per node and 4 GPUs per node. It does not restrict what type of GPU the job can run on so both A100-40 and A100-80 can be used:</p> <pre><code>#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=Example_Grid_job\n#SBATCH --time=12:0:0\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=4\n#SBATCH --cpus-per-task=8\n#SBATCH --gres=gpu:4\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]             \n\n# Load the correct modules\nmodule load /home/y07/shared/tursa-modules/setup-env\nmodule load gcc/9.3.0\nmodule load cuda/12.3\nmodule load openmpi/4.1.5-cuda12.3 \n\nexport OMP_NUM_THREADS=8\nexport OMP_PLACES=cores\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# These will need to be changed to match the actual application you are running\napplication=\"my_mpi_openmp_app.x\"\noptions=\"arg 1 arg2\"\n\n# We have reserved the full nodes, now distribute the processes as\n# required: 4 MPI processes per node, stride of 12 cores between \n# MPI processes\n# \n# Note use of gpu_launch.sh wrapper script for GPU and NIC pinning \nsrun --nodes=4 --tasks-per-node=4 --cpus-per-task=12 \\\n     --hint=nomultithread --distribution=block:block \\\n     gpu_launch.sh \\\n     ${application} ${options}\n</code></pre> <p>This will run your executable \"my_mpi_opnemp_app.x\" in parallel usimg 16 MPI processes on 4 nodes. 4 GPUs will be used per node.</p> <p>Important</p> <p>You must use the <code>gpu_launch.sh</code> wrapper script to get the correct biniding of GPU to MPI processes and of network interface to GPU and MPI process. This script is described in more detail below.</p>"},{"location":"tursa-user-guide/scheduler/#gpu_launchsh-wrapper-script","title":"<code>gpu_launch.sh</code> wrapper script","text":"<p>The <code>gpu_launch.sh</code> wrapper script is required to set the correct binding of GPU to MPI processes and the correct binding of interconnect interfaces to  MPI process and GPU. We provide this centrally for convenience but its contents are simple:</p> <pre><code>#!/bin/bash\n\n# Compute the raw process ID for binding to GPU and NIC\nlrank=$((SLURM_PROCID % SLURM_NTASKS_PER_NODE))\n\n# Bind the process to the correct GPU and NIC\nexport CUDA_VISIBLE_DEVICES=${lrank}\nexport UCX_NET_DEVICES=mlx5_${lrank}:1\n\n$@\n</code></pre>"},{"location":"tursa-user-guide/scheduler/#using-the-dev-qos","title":"Using the <code>dev</code> QoS","text":"<p>The <code>dev</code> QoS is designed for faster turnaround of short jobs than is usually available through the production QoS. It is subject to a number of restrictions:</p> <ul> <li>4 hour maximum walltime</li> <li>Maximum job size:<ul> <li>2 nodes for <code>gpu-a100-80</code> partition</li> <li>1 node for <code>gpu-a100-40</code> partition</li> </ul> </li> <li>Maximum 1 job running per user</li> <li>Maximum 2 jobs queued per user</li> <li>Only available to projects with a positive budget</li> </ul> <p>In addtion, you must specify either the <code>gpu-a100-80</code> or <code>gpu-a100-40</code> partitions when using the <code>dev</code> QoS.</p> <p>Tip</p> <p>The generic <code>gpu</code> partition will not work consistently when using the <code>dev</code> QoS.</p> <p>Here is an example job submission script for a 2-node job in the <code>dev</code> QoS using the <code>gpu-a100-80</code>  partition. Note the use of the <code>gpu_launch.sh</code> wrapper script to get correct GPU and NIC binding.</p> <pre><code>#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=Example_Dev_Job\n#SBATCH --time=12:0:0\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=48\n#SBATCH --cpus-per-task=\n#SBATCH --gres=gpu:4\n#SBATCH --partition=gpu-a100-80\n#SBATCH --qos=dev\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n\nexport OMP_NUM_THREADS=1\nexport OMP_PLACES=cores\n\n# Load the correct modules\nmodule load /home/y07/shared/tursa-modules/setup-env\nmodule load gcc/9.3.0\nmodule load cuda/12.3\nmodule load openmpi/4.1.5-cuda12.3 \n\n# These will need to be changed to match the actual application you are running\napplication=\"my_mpi_openmp_app.x\"\noptions=\"arg 1 arg2\"\n\n# We have reserved the full nodes, now distribute the processes as\n# required: 4 MPI processes per node, stride of 12 cores between \n# MPI processes\n# \n# Note use of gpu_launch.sh wrapper script for GPU and NIC pinning \nsrun --nodes=2 --tasks-per-node=4 --cpus-per-task=12 \\\n     --hint=nomultithread --distribution=block:block \\\n     gpu_launch.sh \\\n     ${application} ${options}\n</code></pre>"},{"location":"tursa-user-guide/sw-environment/","title":"Software environment","text":"<p>The software environment on Tursa is primarily controlled through the <code>module</code> command. By loading and switching software modules you control which software and versions are available to you.</p> <p>Information</p> <p>A module is a self-contained description of a software package -- it contains the settings required to run a software package and, usually, encodes required dependencies on other software packages.</p> <p>By default, all users on Tursa start with the default software environment loaded.</p> <p>Software modules on Tursa are provided by both Eviden and by EPCC.</p> <p>In this section, we provide:</p> <ul> <li>A brief overview of the <code>module</code> command</li> <li>A brief description of how the <code>module</code> command manipulates your      environment</li> </ul>"},{"location":"tursa-user-guide/sw-environment/#using-the-module-command","title":"Using the <code>module</code> command","text":"<p>We only cover basic usage of the <code>module</code> command here. For full documentation please see the Linux manual page on modules</p> <p>The <code>module</code> command takes a subcommand to indicate what operation you wish to perform. Common subcommands are:</p> <ul> <li><code>module list [name]</code> - List modules currently loaded in your      environment, optionally filtered by <code>[name]</code></li> <li><code>module avail [name]</code> - List modules available, optionally      filtered by <code>[name]</code></li> <li><code>module savelist</code> - List module collections available (usually      used for accessing different programming environments)</li> <li><code>module restore name</code> - Restore the module collection called      <code>name</code> (usually used for setting up a programming environment)</li> <li><code>module load name</code> - Load the module called <code>name</code> into your      environment</li> <li><code>module remove name</code> - Remove the module called <code>name</code> from your      environment</li> <li><code>module swap old new</code> - Swap module <code>new</code> for module <code>old</code> in your      environment</li> <li><code>module help name</code> - Show help information on module <code>name</code></li> <li><code>module show name</code> - List what module <code>name</code> actually does to your      environment</li> </ul> <p>These are described in more detail below.</p>"},{"location":"tursa-user-guide/sw-environment/#information-on-the-available-modules","title":"Information on the available modules","text":"<p>The <code>module list</code> command will give the names of the modules and their versions you have presently loaded in your environment. By default, you will have no modules loaded when you first log into Tursa</p> <p>Finding out which software modules are available on the system is performed using the <code>module avail</code> command. To list all software modules available, use:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module avail\n------------------------------------------ /mnt/lustre/tursafs1/apps/cuda-11.0.2-modulefiles -------------------------------------------\ncuda/11.0.2  openmpi/4.1.1-cuda11.0.2  ucx/1.10.1-cuda11.0.2  \n\n------------------------------------------- /mnt/lustre/tursafs1/apps/cuda-11.4-modulefiles --------------------------------------------\ncuda/11.4.1  openmpi/4.1.1-cuda11.4  ucx/1.12.0-cuda11.4  \n\n------------------------------------------ /mnt/lustre/tursafs1/apps/cuda-11.4.1-modulefiles -------------------------------------------\ncuda/11.4.1  openmpi/4.1.1-cuda11.4.1  ucx/1.12.0-cuda11.4.1  \n\n------------------------------------------------ /mnt/lustre/tursafs1/apps/modulefiles -------------------------------------------------\ncuda/11.0.3  dot  gcc/9.3.0  module-git  module-info  modules  null  openmpi/4.1.1  ucx/1.10.1  use.own  xpmem/2.6.5   \n</code></pre> <p>This will list all the names and versions of the modules available on the service. Not all of them may work in your account though due to, for example, licencing restrictions. You will notice that for many modules we have more than one version, each of which is identified by a version number. One of these versions is the default. As the service develops the default version will change and old versions of software may be deleted.</p> <p>You can list all the modules of a particular type by providing an argument to the <code>module avail</code> command. For example, to list all available versions of the OpenMPI library, use:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module avail openmpi\n------------------------------------------ /mnt/lustre/tursafs1/apps/cuda-11.0.2-modulefiles -------------------------------------------\nopenmpi/4.1.1-cuda11.0.2  \n\n------------------------------------------- /mnt/lustre/tursafs1/apps/cuda-11.4-modulefiles --------------------------------------------\nopenmpi/4.1.1-cuda11.4  \n\n------------------------------------------ /mnt/lustre/tursafs1/apps/cuda-11.4.1-modulefiles -------------------------------------------\nopenmpi/4.1.1-cuda11.4.1  \n\n----------------\n</code></pre> <p>The <code>module show</code> command reveals what operations the module actually performs to change your environment when it is loaded. We provide a brief overview of what the significance of these different settings mean below. For example, for the default openmpi module:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module show openmpi\n-------------------------------------------------------------------\n/mnt/lustre/tursafs1/apps/cuda-11.0.2-modulefiles/openmpi/4.1.1-cuda11.0.2:\n\nmodule-whatis   Sets up OpenMPI on your environment\nsetenv          MPI_ROOT        /mnt/lustre/tursafs1/apps/basestack/cuda-11.0.2/openmpi/4.1.1/\nprepend-path    PATH /mnt/lustre/tursafs1/apps/basestack/cuda-11.0.2/openmpi/4.1.1/bin/\nprepend-path    LD_LIBRARY_PATH /mnt/lustre/tursafs1/apps/basestack/cuda-11.0.2/openmpi/4.1.1/lib\nprepend-path    MANPATH /opt/mpi/openmpi/4.0.4.1/share/man\nmodule load     ucx/1.10.1\nsetenv          OMPI_CC cc\nsetenv          OMPI_CXX        g++\nsetenv          OMPI_CFLAGS     -g -m64\nsetenv          OMPI_CXXFLAGS   -g -m64\n-------------------------------------------------------------------\n</code></pre>"},{"location":"tursa-user-guide/sw-environment/#loading-removing-and-swapping-modules","title":"Loading, removing and swapping modules","text":"<p>To load a module to use the <code>module load</code> command. For example, to load the default version of OpenMPI into your environment, use:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module load openmpi\n\n        UCX 1.10 loaded\n\n\n        OpenMPI 4.1.1 loaded\n</code></pre> <p>Once you have done this, your environment will be setup to use the OpenMPI library. The above command will load the default version of OpenMPI. If you need a specific version of the software, you can add more information:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module load openmpi/4.1.1-cuda11.4.1\n\n        UCX 1.12.0 compiled with cuda 11.4.1 loaded\n\n\n        OpenMPI 4.1.1 with cuda-11.4.1 and UCX 1.12.0  loaded\n</code></pre> <p>will load OpenMPI version 4.1.1 with CUDA 11.4.1 into your environment, regardless of the default.</p> <p>If you want to remove software from your environment, <code>module rm</code> will remove a loaded module:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module rm openmpi\n</code></pre> <p>will unload what ever version of <code>openmpi</code> (even if it is not the default) you might have loaded.</p> <p>There are many situations in which you might want to change the presently loaded version to a different one, such as trying the latest version which is not yet the default or using a legacy version to keep compatibility with old data. This can be achieved most easily by using <code>module swap oldmodule newmodule</code>.</p> <p>Suppose you have loaded version 4.1.1 of <code>openmpi</code>, the following command will change to version 4.1.1-cuda11.4.1:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module swap openmpi openmpi/4.1.1-cuda11.4.1\n\n        UCX 1.12.0 compiled with cuda 11.4.1 loaded\n\n\n        OpenMPI 4.1.1 with cuda-11.4.1 and UCX 1.12.0  loaded\n</code></pre> <p>You did not need to specify the version of the loaded module in your current environment as this can be inferred as it will be the only one you have loaded.</p>"},{"location":"tursa-user-guide/sw-environment/#capturing-your-environment-for-reuse","title":"Capturing your environment for reuse","text":"<p>Sometimes it is useful to save the module environment that you are using to compile a piece of code or execute a piece of software. This is saved as a module collection. You can save a collection from your current environment by executing:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module save [collection_name]\n</code></pre> <p>Note</p> <p>If you do not specify the environment name, it is called <code>default</code>.</p> <p>You can find the list of saved module environments by executing:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module savelist\nNamed collection list:\n 1) default\n</code></pre> <p>To list the modules in a collection, you can execute, e.g.,:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module saveshow default\n-------------------------------------------------------------------\n/home/z01/z01/dc-turn1/.module/default:\n\nmodule use --append /mnt/lustre/tursafs1/apps/cuda-11.0.2-modulefiles\nmodule use --append /mnt/lustre/tursafs1/apps/cuda-11.4-modulefiles\nmodule use --append /mnt/lustre/tursafs1/apps/cuda-11.4.1-modulefiles\nmodule use --append /mnt/lustre/tursafs1/apps/modulefilesintel\nmodule use --append /mnt/lustre/tursafs1/apps/modulefiles\nmodule load ucx/1.12.0-cuda11.4.1\nmodule load openmpi/4.1.1-cuda11.4.1\n\n-------------------------------------------------------------------\n</code></pre> <p>Note again that the details of the collection have been saved to the home directory (the first line of output above). It is possible to save a module collection with a fully qualified path, e.g.,</p> <pre><code>[dc-user1@tursa-login1 ~]$ module save /home/t01/z01/auser/my-module-collection\n</code></pre> <p>if you want to save to a specific file name.</p> <p>To delete a module environment, you can execute:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module saverm &lt;environment_name&gt;\n</code></pre>"},{"location":"tursa-user-guide/sw-environment/#shell-environment-overview","title":"Shell environment overview","text":"<p>When you log in to Tursa, you are using the bash shell by default. As any other software, the bash shell has loaded a set of environment variables that can be listed by executing <code>printenv</code> or <code>export</code>.</p> <p>The environment variables listed before are useful to define the behaviour of the software you run. For instance, <code>OMP_NUM_THREADS</code> define the number of threads.</p> <p>To define an environment variable, you need to execute:</p> <pre><code>export OMP_NUM_THREADS=4\n</code></pre> <p>Please note there are no blanks between the variable name, the assignation symbol, and the value. If the value is a string, enclose the string in double quotation marks.</p> <p>You can show the value of a specific environment variable if you print it:</p> <pre><code>echo $OMP_NUM_THREADS\n</code></pre> <p>Do not forget the dollar symbol. To remove an environment variable, just execute:</p> <pre><code>unset OMP_NUM_THREADS\n</code></pre>"},{"location":"tursa-user-guide/sw-environment/#compiler-environment","title":"Compiler environment","text":"<p>The system supports two different primary compiler environments:</p> <ul> <li>GCC toolchain: GCC, CUDA 11.4, OpenMPI 4.1.1</li> <li>NVHPC toolchain: NVHPC 21.7, OpenMPI 4.1.1</li> </ul>"},{"location":"tursa-user-guide/sw-environment/#gcc-toolchain","title":"GCC toolchain","text":"<p>To compile on the system for GPU nodes using the GCC toolchain, you would typically load the required modules:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module load gcc/9.3.0\n[dc-user1@tursa-login1 ~]$ module load cuda/11.4.1 \n[dc-user1@tursa-login1 ~]$ module load openmpi/4.1.1-cuda11.4\n[dc-user1@tursa-login1 ~]$ module list\nCurrently Loaded Modulefiles:\n 1) gcc/9.3.0   2) cuda/11.4.1   3) ucx/1.12.0-cuda11.4   4) openmpi/4.1.1-cuda11.4 \n</code></pre> <p>Once you have loaded the modules, the standard OpenMPI compiler wrapper scripts are available:</p> <ul> <li><code>mpicc</code></li> <li><code>mpicxx</code></li> <li><code>mpif90</code></li> </ul> <p>You can find more information on these scripts in the OpenMPI documentation.</p>"},{"location":"tursa-user-guide/sw-environment/#nvhpc-toolchain","title":"NVHPC toolchain","text":"<p>To compile on the system for GPU nodes using the GCC toolchain, you would typically load the required modules:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module load /home/y07/shared/tursa-modules/setup-env\n[dc-user1@tursa-login1 ~]$ module load gcc/9.3.0\n[dc-user1@tursa-login1 ~]$ module load nvhpc/21.7-nompi\n[dc-user1@tursa-login1 ~]$ module load openmpi/4.1.1-cuda11.4\n[dc-user1@tursa-login1 ~]$ module list\nCurrently Loaded Modulefiles:\n 1) /mnt/lustre/tursafs1/home/y07/shared/tursa-modules/setup-env   2) nvhpc/21.7-nompi\n 3) ucx/1.12.0-cuda11.4   4) openmpi/4.1.1-cuda11.4    5) gcc/9.3.0\n</code></pre> <p>Once you have loaded the modules, the standard OpenMPI compiler wrapper scripts are available:</p> <ul> <li><code>mpicc</code></li> <li><code>mpicxx</code></li> <li><code>mpif90</code></li> </ul> <p>and the NVIDIA compilers are available as:</p> <ul> <li><code>nvcc</code></li> <li><code>nvc++</code></li> <li><code>nvfortran</code></li> </ul> <p>Tip</p> <p>Both the NVIDIA compilers and the MPI compiler wrapper scripts will use the GCC compilers directly in the default configuration - this is often what you want. If you want the compiler wrappers to call the NVIDIA compilers themselves rather than  GCC directly, you would use:</p> <pre><code>export OMPI_CC=nvcc\nexport OMPI_CXX=nvc++\nexport OMPI_FC=nvfortran\n</code></pre>"},{"location":"tursa-user-guide/sw-environment/#other-build-tools","title":"Other build tools","text":""},{"location":"tursa-user-guide/sw-environment/#cmake","title":"cmake","text":"<p>CMake is available by using the commands:</p> <pre><code>[dc-user1@tursa-login1 ~]$ module load /home/y07/shared/tursa-modules/setup-env\n[dc-user1@tursa-login1 ~]$ module load cmake\n</code></pre>"}]}
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index b817ed3..e986db1 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ
diff --git a/tursa-user-guide/scheduler/index.html b/tursa-user-guide/scheduler/index.html
index f27a881..c90a229 100644
--- a/tursa-user-guide/scheduler/index.html
+++ b/tursa-user-guide/scheduler/index.html
@@ -1632,12 +1632,20 @@ <h3 id="example-job-submission-script-for-a-parallel-job-using-cuda">Example: jo
 module<span class="w"> </span>load<span class="w"> </span>openmpi/4.1.5-cuda12.3<span class="w"> </span>
 
 <span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">8</span>
+<span class="nb">export</span><span class="w"> </span><span class="nv">OMP_PLACES</span><span class="o">=</span>cores
+<span class="nb">export</span><span class="w"> </span><span class="nv">SRUN_CPUS_PER_TASK</span><span class="o">=</span><span class="nv">$SLURM_CPUS_PER_TASK</span>
 
 <span class="c1"># These will need to be changed to match the actual application you are running</span>
 <span class="nv">application</span><span class="o">=</span><span class="s2">&quot;my_mpi_openmp_app.x&quot;</span>
 <span class="nv">options</span><span class="o">=</span><span class="s2">&quot;arg 1 arg2&quot;</span>
 
-<span class="nb">srun</span><span class="w"> </span>--hint<span class="o">=</span>nomultithread<span class="w"> </span>--distribution<span class="o">=</span>block:block<span class="w"> </span><span class="se">\</span>
+<span class="c1"># We have reserved the full nodes, now distribute the processes as</span>
+<span class="c1"># required: 4 MPI processes per node, stride of 12 cores between </span>
+<span class="c1"># MPI processes</span>
+<span class="c1"># </span>
+<span class="c1"># Note use of gpu_launch.sh wrapper script for GPU and NIC pinning </span>
+<span class="nb">srun</span><span class="w"> </span>--nodes<span class="o">=</span><span class="m">4</span><span class="w"> </span>--tasks-per-node<span class="o">=</span><span class="m">4</span><span class="w"> </span>--cpus-per-task<span class="o">=</span><span class="m">12</span><span class="w"> </span><span class="se">\</span>
+<span class="w">     </span>--hint<span class="o">=</span>nomultithread<span class="w"> </span>--distribution<span class="o">=</span>block:block<span class="w"> </span><span class="se">\</span>
 <span class="w">     </span>gpu_launch.sh<span class="w"> </span><span class="se">\</span>
 <span class="w">     </span><span class="si">${</span><span class="nv">application</span><span class="si">}</span><span class="w"> </span><span class="si">${</span><span class="nv">options</span><span class="si">}</span>
 </code></pre></div>
@@ -1694,8 +1702,8 @@ <h2 id="using-the-dev-qos">Using the <code>dev</code> QoS</h2>
 <span class="kp">#SBATCH --job-name=Example_Dev_Job</span>
 <span class="kp">#SBATCH --time=12:0:0</span>
 <span class="kp">#SBATCH --nodes=2</span>
-<span class="kp">#SBATCH --tasks-per-node=4</span>
-<span class="kp">#SBATCH --cpus-per-task=12</span>
+<span class="kp">#SBATCH --tasks-per-node=48</span>
+<span class="kp">#SBATCH --cpus-per-task=</span>
 <span class="kp">#SBATCH --gres=gpu:4</span>
 <span class="kp">#SBATCH --partition=gpu-a100-80</span>
 <span class="kp">#SBATCH --qos=dev</span>
@@ -1704,12 +1712,7 @@ <h2 id="using-the-dev-qos">Using the <code>dev</code> QoS</h2>
 <span class="kp">#SBATCH --account=[budget code]</span>
 
 <span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">1</span>
-
-<span class="c1"># Load the correct modules</span>
-module<span class="w"> </span>load<span class="w"> </span>/home/y07/shared/tursa-modules/setup-env
-module<span class="w"> </span>load<span class="w"> </span>gcc/9.3.0
-module<span class="w"> </span>load<span class="w"> </span>cuda/12.3
-module<span class="w"> </span>load<span class="w"> </span>openmpi/4.1.5-cuda12.3<span class="w"> </span>
+<span class="nb">export</span><span class="w"> </span><span class="nv">OMP_PLACES</span><span class="o">=</span>cores
 
 <span class="c1"># Load the correct modules</span>
 module<span class="w"> </span>load<span class="w"> </span>/home/y07/shared/tursa-modules/setup-env
@@ -1721,7 +1724,13 @@ <h2 id="using-the-dev-qos">Using the <code>dev</code> QoS</h2>
 <span class="nv">application</span><span class="o">=</span><span class="s2">&quot;my_mpi_openmp_app.x&quot;</span>
 <span class="nv">options</span><span class="o">=</span><span class="s2">&quot;arg 1 arg2&quot;</span>
 
-<span class="nb">srun</span><span class="w"> </span>--hint<span class="o">=</span>nomultithread<span class="w"> </span>--distribution<span class="o">=</span>block:block<span class="w"> </span><span class="se">\</span>
+<span class="c1"># We have reserved the full nodes, now distribute the processes as</span>
+<span class="c1"># required: 4 MPI processes per node, stride of 12 cores between </span>
+<span class="c1"># MPI processes</span>
+<span class="c1"># </span>
+<span class="c1"># Note use of gpu_launch.sh wrapper script for GPU and NIC pinning </span>
+<span class="nb">srun</span><span class="w"> </span>--nodes<span class="o">=</span><span class="m">2</span><span class="w"> </span>--tasks-per-node<span class="o">=</span><span class="m">4</span><span class="w"> </span>--cpus-per-task<span class="o">=</span><span class="m">12</span><span class="w"> </span><span class="se">\</span>
+<span class="w">     </span>--hint<span class="o">=</span>nomultithread<span class="w"> </span>--distribution<span class="o">=</span>block:block<span class="w"> </span><span class="se">\</span>
 <span class="w">     </span>gpu_launch.sh<span class="w"> </span><span class="se">\</span>
 <span class="w">     </span><span class="si">${</span><span class="nv">application</span><span class="si">}</span><span class="w"> </span><span class="si">${</span><span class="nv">options</span><span class="si">}</span>
 </code></pre></div>