From 09e007992ecc8a6f371cfa4e7225a829ffb49cd2 Mon Sep 17 00:00:00 2001
From: Yadu Babuji <yadudoc1729@gmail.com>
Date: Fri, 9 Aug 2024 11:43:06 -0500
Subject: [PATCH] Adding notes on `available_accelerators`

* Adding notes on how to specify list of strings to available_accelerators
* Clarify how to bind multiple GPUs to workers
---
 docs/userguide/configuring.rst | 43 ++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/docs/userguide/configuring.rst b/docs/userguide/configuring.rst
index bb3a3949e3..f3fe5cc407 100644
--- a/docs/userguide/configuring.rst
+++ b/docs/userguide/configuring.rst
@@ -306,9 +306,13 @@ and Work Queue does not require Python to run.
 Accelerators
 ------------
 
-Many modern clusters provide multiple accelerators per compute note, yet many applications are best suited to using a single accelerator per task.
-Parsl supports pinning each worker to difference accelerators using ``available_accelerators`` option of the :class:`~parsl.executors.HighThroughputExecutor`.
-Provide either the number of executors (Parsl will assume they are named in integers starting from zero) or a list of the names of the accelerators available on the node.
+Many modern clusters provide multiple accelerators per compute note, yet many applications are best suited to using a
+single accelerator per task. Parsl supports pinning each worker to different accelerators using
+``available_accelerators`` option of the :class:`~parsl.executors.HighThroughputExecutor`. Provide either the number of
+executors (Parsl will assume they are named in integers starting from zero) or a list of the names of the accelerators
+available on the node. Parsl will limit the number of workers it launches to the number of accelerators specified,
+in other words, you cannot have more workers per node than there are accelerators. By default, Parsl will launch
+as many workers as the accelerators specified via ``available_accelerators``.
 
 .. code-block:: python
 
@@ -327,7 +331,38 @@ Provide either the number of executors (Parsl will assume they are named in inte
         strategy='none',
     )
 
-For hardware that uses Nvidia devices, Parsl allows for the oversubscription of workers to GPUS.  This is intended to make use of Nvidia's `Multi-Process Service (MPS) <https://docs.nvidia.com/deploy/mps/>`_ available on many of their GPUs that allows users to run multiple concurrent processes on a single GPU.  The user needs to set in the ``worker_init`` commands to start MPS on every node in the block (this is machine dependent).  The ``available_accelerators`` option should then be set to the total number of GPU partitions run on a single node in the block.  For example, for a node with 4 Nvidia GPUs, to create 8 workers per GPU, set ``available_accelerators=32``.  GPUs will be assigned to workers in ascending order in contiguous blocks.  In the example, workers 0-7 will be placed on GPU 0, workers 8-15 on GPU 1, workers 16-23 on GPU 2, and workers 24-31 on GPU 3. 
+It is possible to bind multiple/specific accelerators to each worker by specifying a list of comma separated strings
+each specifying accelerators. In the context of binding to NVIDIA GPUs, this works by setting ``CUDA_VISIBLE_DEVICES``
+on each worker to a specific string in the list supplied to ``available_accelerators``.
+
+Here's an example:
+
+.. code-block:: python
+
+    # The following config is trimmed for clarity
+    local_config = Config(
+        executors=[
+            HighThroughputExecutor(
+                # Starts 2 workers per node, each bound to 2 GPUs
+                available_accelerators=["0,1", "2,3"],
+
+                # Start a single worker bound to all 4 GPUs
+                # available_accelerators=["0,1,2,3"]
+            )
+        ],
+    )
+
+GPU Oversubscription
+""""""""""""""""""""
+
+For hardware that uses Nvidia devices, Parsl allows for the oversubscription of workers to GPUS.  This is intended to
+make use of Nvidia's `Multi-Process Service (MPS) <https://docs.nvidia.com/deploy/mps/>`_ available on many of their
+GPUs that allows users to run multiple concurrent processes on a single GPU.  The user needs to set in the
+``worker_init`` commands to start MPS on every node in the block (this is machine dependent).  The
+``available_accelerators`` option should then be set to the total number of GPU partitions run on a single node in the
+block.  For example, for a node with 4 Nvidia GPUs, to create 8 workers per GPU, set ``available_accelerators=32``.
+GPUs will be assigned to workers in ascending order in contiguous blocks.  In the example, workers 0-7 will be placed
+on GPU 0, workers 8-15 on GPU 1, workers 16-23 on GPU 2, and workers 24-31 on GPU 3.
     
 Multi-Threaded Applications
 ---------------------------