Skip to content

Commit

Permalink
Merge branch 'master' into add_manager_selector_by_block
Browse files Browse the repository at this point in the history
  • Loading branch information
matthewc2003 authored Aug 21, 2024
2 parents f4e04d8 + b284dc1 commit 1f16dcb
Show file tree
Hide file tree
Showing 55 changed files with 661 additions and 919 deletions.
5 changes: 4 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Parsl - Parallel Scripting Library
==================================
|licence| |build-status| |docs| |NSF-1550588| |NSF-1550476| |NSF-1550562| |NSF-1550528|
|licence| |build-status| |docs| |NSF-1550588| |NSF-1550476| |NSF-1550562| |NSF-1550528| |CZI-EOSS|

Parsl extends parallelism in Python beyond a single computer.

Expand Down Expand Up @@ -64,6 +64,9 @@ then explore the `parallel computing patterns <https://parsl.readthedocs.io/en/s
.. |NSF-1550475| image:: https://img.shields.io/badge/NSF-1550475-blue.svg
:target: https://nsf.gov/awardsearch/showAward?AWD_ID=1550475
:alt: NSF award info
.. |CZI-EOSS| image:: https://chanzuckerberg.github.io/open-science/badges/CZI-EOSS.svg
:target: https://czi.co/EOSS
:alt: CZI's Essential Open Source Software for Science


Quickstart
Expand Down
8 changes: 4 additions & 4 deletions docs/historical/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ New Functionality
* New launcher: `parsl.launchers.WrappedLauncher` for launching tasks inside containers.

* `parsl.channels.SSHChannel` now supports a ``key_filename`` kwarg `issue#1639 <https://github.com/Parsl/parsl/issues/1639>`_
* ``parsl.channels.SSHChannel`` now supports a ``key_filename`` kwarg `issue#1639 <https://github.com/Parsl/parsl/issues/1639>`_

* Newly added Makefile wraps several frequent developer operations such as:

Expand Down Expand Up @@ -442,7 +442,7 @@ New Functionality
module, parsl.data_provider.globus

* `parsl.executors.WorkQueueExecutor`: a new executor that integrates functionality from `Work Queue <http://ccl.cse.nd.edu/software/workqueue/>`_ is now available.
* New provider to support for Ad-Hoc clusters `parsl.providers.AdHocProvider`
* New provider to support for Ad-Hoc clusters ``parsl.providers.AdHocProvider``
* New provider added to support LSF on Summit `parsl.providers.LSFProvider`
* Support for CPU and Memory resource hints to providers `(github) <https://github.com/Parsl/parsl/issues/942>`_.
* The ``logging_level=logging.INFO`` in `parsl.monitoring.MonitoringHub` is replaced with ``monitoring_debug=False``:
Expand All @@ -468,7 +468,7 @@ New Functionality
* Several test-suite improvements that have dramatically reduced test duration.
* Several improvements to the Monitoring interface.
* Configurable port on `parsl.channels.SSHChannel`.
* Configurable port on ``parsl.channels.SSHChannel``.
* ``suppress_failure`` now defaults to True.
* `parsl.executors.HighThroughputExecutor` is the recommended executor, and ``IPyParallelExecutor`` is deprecated.
* `parsl.executors.HighThroughputExecutor` will expose worker information via environment variables: ``PARSL_WORKER_RANK`` and ``PARSL_WORKER_COUNT``
Expand Down Expand Up @@ -532,7 +532,7 @@ New Functionality
* Cleaner user app file log management.
* Updated configurations using `parsl.executors.HighThroughputExecutor` in the configuration section of the userguide.
* Support for OAuth based SSH with `parsl.channels.OAuthSSHChannel`.
* Support for OAuth based SSH with ``parsl.channels.OAuthSSHChannel``.

Bug Fixes
^^^^^^^^^
Expand Down
13 changes: 3 additions & 10 deletions docs/reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,9 @@ Configuration
Channels
========

.. autosummary::
:toctree: stubs
:nosignatures:

parsl.channels.base.Channel
parsl.channels.LocalChannel
parsl.channels.SSHChannel
parsl.channels.OAuthSSHChannel
parsl.channels.SSHInteractiveLoginChannel
Channels are deprecated in Parsl. See
`issue 3515 <https://github.com/Parsl/parsl/issues/3515>`_
for further discussion.

Data management
===============
Expand Down Expand Up @@ -109,7 +103,6 @@ Providers
:toctree: stubs
:nosignatures:

parsl.providers.AdHocProvider
parsl.providers.AWSProvider
parsl.providers.CobaltProvider
parsl.providers.CondorProvider
Expand Down
81 changes: 45 additions & 36 deletions docs/userguide/configuring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,14 @@ queues, durations, and data management options.
The following example shows a basic configuration object (:class:`~parsl.config.Config`) for the Frontera
supercomputer at TACC.
This config uses the `parsl.executors.HighThroughputExecutor` to submit
tasks from a login node (`parsl.channels.LocalChannel`). It requests an allocation of
tasks from a login node. It requests an allocation of
128 nodes, deploying 1 worker for each of the 56 cores per node, from the normal partition.
To limit network connections to just the internal network the config specifies the address
used by the infiniband interface with ``address_by_interface('ib0')``

.. code-block:: python
from parsl.config import Config
from parsl.channels import LocalChannel
from parsl.providers import SlurmProvider
from parsl.executors import HighThroughputExecutor
from parsl.launchers import SrunLauncher
Expand All @@ -36,7 +35,6 @@ used by the infiniband interface with ``address_by_interface('ib0')``
address=address_by_interface('ib0'),
max_workers_per_node=56,
provider=SlurmProvider(
channel=LocalChannel(),
nodes_per_block=128,
init_blocks=1,
partition='normal',
Expand Down Expand Up @@ -197,22 +195,6 @@ Stepping through the following question should help formulate a suitable configu
are on a **native Slurm** system like :ref:`configuring_nersc_cori`


4) Where will the main Parsl program run and how will it communicate with the apps?

+------------------------+--------------------------+---------------------------------------------------+
| Parsl program location | App execution target | Suitable channel |
+========================+==========================+===================================================+
| Laptop/Workstation | Laptop/Workstation | `parsl.channels.LocalChannel` |
+------------------------+--------------------------+---------------------------------------------------+
| Laptop/Workstation | Cloud Resources | No channel is needed |
+------------------------+--------------------------+---------------------------------------------------+
| Laptop/Workstation | Clusters with no 2FA | `parsl.channels.SSHChannel` |
+------------------------+--------------------------+---------------------------------------------------+
| Laptop/Workstation | Clusters with 2FA | `parsl.channels.SSHInteractiveLoginChannel` |
+------------------------+--------------------------+---------------------------------------------------+
| Login node | Cluster/Supercomputer | `parsl.channels.LocalChannel` |
+------------------------+--------------------------+---------------------------------------------------+

Heterogeneous Resources
-----------------------

Expand Down Expand Up @@ -324,9 +306,13 @@ and Work Queue does not require Python to run.
Accelerators
------------

Many modern clusters provide multiple accelerators per compute note, yet many applications are best suited to using a single accelerator per task.
Parsl supports pinning each worker to difference accelerators using ``available_accelerators`` option of the :class:`~parsl.executors.HighThroughputExecutor`.
Provide either the number of executors (Parsl will assume they are named in integers starting from zero) or a list of the names of the accelerators available on the node.
Many modern clusters provide multiple accelerators per compute note, yet many applications are best suited to using a
single accelerator per task. Parsl supports pinning each worker to different accelerators using
``available_accelerators`` option of the :class:`~parsl.executors.HighThroughputExecutor`. Provide either the number of
executors (Parsl will assume they are named in integers starting from zero) or a list of the names of the accelerators
available on the node. Parsl will limit the number of workers it launches to the number of accelerators specified,
in other words, you cannot have more workers per node than there are accelerators. By default, Parsl will launch
as many workers as the accelerators specified via ``available_accelerators``.

.. code-block:: python
Expand All @@ -337,7 +323,6 @@ Provide either the number of executors (Parsl will assume they are named in inte
worker_debug=True,
available_accelerators=2,
provider=LocalProvider(
channel=LocalChannel(),
init_blocks=1,
max_blocks=1,
),
Expand All @@ -346,7 +331,38 @@ Provide either the number of executors (Parsl will assume they are named in inte
strategy='none',
)
For hardware that uses Nvidia devices, Parsl allows for the oversubscription of workers to GPUS. This is intended to make use of Nvidia's `Multi-Process Service (MPS) <https://docs.nvidia.com/deploy/mps/>`_ available on many of their GPUs that allows users to run multiple concurrent processes on a single GPU. The user needs to set in the ``worker_init`` commands to start MPS on every node in the block (this is machine dependent). The ``available_accelerators`` option should then be set to the total number of GPU partitions run on a single node in the block. For example, for a node with 4 Nvidia GPUs, to create 8 workers per GPU, set ``available_accelerators=32``. GPUs will be assigned to workers in ascending order in contiguous blocks. In the example, workers 0-7 will be placed on GPU 0, workers 8-15 on GPU 1, workers 16-23 on GPU 2, and workers 24-31 on GPU 3.
It is possible to bind multiple/specific accelerators to each worker by specifying a list of comma separated strings
each specifying accelerators. In the context of binding to NVIDIA GPUs, this works by setting ``CUDA_VISIBLE_DEVICES``
on each worker to a specific string in the list supplied to ``available_accelerators``.

Here's an example:

.. code-block:: python
# The following config is trimmed for clarity
local_config = Config(
executors=[
HighThroughputExecutor(
# Starts 2 workers per node, each bound to 2 GPUs
available_accelerators=["0,1", "2,3"],
# Start a single worker bound to all 4 GPUs
# available_accelerators=["0,1,2,3"]
)
],
)
GPU Oversubscription
""""""""""""""""""""

For hardware that uses Nvidia devices, Parsl allows for the oversubscription of workers to GPUS. This is intended to
make use of Nvidia's `Multi-Process Service (MPS) <https://docs.nvidia.com/deploy/mps/>`_ available on many of their
GPUs that allows users to run multiple concurrent processes on a single GPU. The user needs to set in the
``worker_init`` commands to start MPS on every node in the block (this is machine dependent). The
``available_accelerators`` option should then be set to the total number of GPU partitions run on a single node in the
block. For example, for a node with 4 Nvidia GPUs, to create 8 workers per GPU, set ``available_accelerators=32``.
GPUs will be assigned to workers in ascending order in contiguous blocks. In the example, workers 0-7 will be placed
on GPU 0, workers 8-15 on GPU 1, workers 16-23 on GPU 2, and workers 24-31 on GPU 3.

Multi-Threaded Applications
---------------------------
Expand All @@ -372,7 +388,6 @@ Select the best blocking strategy for processor's cache hierarchy (choose ``alte
worker_debug=True,
cpu_affinity='alternating',
provider=LocalProvider(
channel=LocalChannel(),
init_blocks=1,
max_blocks=1,
),
Expand Down Expand Up @@ -412,18 +427,12 @@ These include ``OMP_NUM_THREADS``, ``GOMP_COMP_AFFINITY``, and ``KMP_THREAD_AFFI
Ad-Hoc Clusters
---------------

Any collection of compute nodes without a scheduler can be considered an
ad-hoc cluster. Often these machines have a shared file system such as NFS or Lustre.
In order to use these resources with Parsl, they need to set-up for password-less SSH access.

To use these ssh-accessible collection of nodes as an ad-hoc cluster, we use
the `parsl.providers.AdHocProvider` with an `parsl.channels.SSHChannel` to each node. An example
configuration follows.
Parsl's support of ad-hoc clusters of compute nodes without a scheduler
is deprecated.

.. literalinclude:: ../../parsl/configs/ad_hoc.py

.. note::
Multiple blocks should not be assigned to each node when using the `parsl.executors.HighThroughputExecutor`
See
`issue #3515 <https://github.com/Parsl/parsl/issues/3515>`_
for further discussion.

Amazon Web Services
-------------------
Expand Down
5 changes: 1 addition & 4 deletions docs/userguide/examples/config.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
from parsl.channels import LocalChannel
from parsl.config import Config
from parsl.executors import HighThroughputExecutor
from parsl.providers import LocalProvider
Expand All @@ -8,9 +7,7 @@
HighThroughputExecutor(
label="htex_local",
cores_per_worker=1,
provider=LocalProvider(
channel=LocalChannel(),
),
provider=LocalProvider(),
)
],
)
3 changes: 1 addition & 2 deletions docs/userguide/execution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,7 @@ Parsl currently supports the following providers:
7. `parsl.providers.AWSProvider`: This provider allows you to provision and manage cloud nodes from Amazon Web Services.
8. `parsl.providers.GoogleCloudProvider`: This provider allows you to provision and manage cloud nodes from Google Cloud.
9. `parsl.providers.KubernetesProvider`: This provider allows you to provision and manage containers on a Kubernetes cluster.
10. `parsl.providers.AdHocProvider`: This provider allows you manage execution over a collection of nodes to form an ad-hoc cluster.
11. `parsl.providers.LSFProvider`: This provider allows you to schedule resources via IBM's LSF scheduler.
10. `parsl.providers.LSFProvider`: This provider allows you to schedule resources via IBM's LSF scheduler.



Expand Down
11 changes: 5 additions & 6 deletions docs/userguide/plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ executor to run code on the local submitting host, while another executor can
run the same code on a large supercomputer.


Providers, Launchers and Channels
---------------------------------
Providers and Launchers
-----------------------
Some executors are based on blocks of workers (for example the
`parsl.executors.HighThroughputExecutor`: the submit side requires a
batch system (eg slurm, kubernetes) to start worker processes, which then
Expand All @@ -34,10 +34,9 @@ add on any wrappers that are needed to launch the command (eg srun inside
slurm). Providers and launchers are usually paired together for a particular
system type.

A `Channel` allows the commands used to interact with an `ExecutionProvider` to be
executed on a remote system. The default channel executes commands on the
local system, but a few variants of an `parsl.channels.SSHChannel` are provided.

Parsl also has a deprecated ``Channel`` abstraction. See
`issue 3515 <https://github.com/Parsl/parsl/issues/3515>`_
for further discussion.

File staging
------------
Expand Down
5 changes: 1 addition & 4 deletions parsl/channels/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
from parsl.channels.base import Channel
from parsl.channels.local.local import LocalChannel
from parsl.channels.oauth_ssh.oauth_ssh import OAuthSSHChannel
from parsl.channels.ssh.ssh import SSHChannel
from parsl.channels.ssh_il.ssh_il import SSHInteractiveLoginChannel

__all__ = ['Channel', 'SSHChannel', 'LocalChannel', 'SSHInteractiveLoginChannel', 'OAuthSSHChannel']
__all__ = ['Channel', 'LocalChannel']
16 changes: 12 additions & 4 deletions parsl/channels/oauth_ssh/oauth_ssh.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
import logging
import socket

import paramiko

from parsl.channels.ssh.ssh import SSHChannel
from parsl.channels.ssh.ssh import DeprecatedSSHChannel
from parsl.errors import OptionalModuleMissing

try:
import paramiko
_ssh_enabled = True
except (ImportError, NameError, FileNotFoundError):
_ssh_enabled = False

try:
from oauth_ssh.oauth_ssh_token import find_access_token
from oauth_ssh.ssh_service import SSHService
Expand All @@ -17,7 +21,7 @@
logger = logging.getLogger(__name__)


class OAuthSSHChannel(SSHChannel):
class DeprecatedOAuthSSHChannel(DeprecatedSSHChannel):
"""SSH persistent channel. This enables remote execution on sites
accessible via ssh. This channel uses Globus based OAuth tokens for authentication.
"""
Expand All @@ -38,6 +42,10 @@ def __init__(self, hostname, username=None, script_dir=None, envs=None, port=22)
Raises:
'''
if not _ssh_enabled:
raise OptionalModuleMissing(['ssh'],
"OauthSSHChannel requires the ssh module and config.")

if not _oauth_ssh_enabled:
raise OptionalModuleMissing(['oauth_ssh'],
"OauthSSHChannel requires oauth_ssh module and config.")
Expand Down
24 changes: 17 additions & 7 deletions parsl/channels/ssh/ssh.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
import logging
import os

import paramiko

from parsl.channels.base import Channel
from parsl.channels.errors import (
AuthException,
Expand All @@ -13,18 +11,27 @@
FileCopyException,
SSHException,
)
from parsl.errors import OptionalModuleMissing
from parsl.utils import RepresentationMixin

try:
import paramiko
_ssh_enabled = True
except (ImportError, NameError, FileNotFoundError):
_ssh_enabled = False


logger = logging.getLogger(__name__)


class NoAuthSSHClient(paramiko.SSHClient):
def _auth(self, username, *args):
self._transport.auth_none(username)
return
if _ssh_enabled:
class NoAuthSSHClient(paramiko.SSHClient):
def _auth(self, username, *args):
self._transport.auth_none(username)
return


class SSHChannel(Channel, RepresentationMixin):
class DeprecatedSSHChannel(Channel, RepresentationMixin):
''' SSH persistent channel. This enables remote execution on sites
accessible via ssh. It is assumed that the user has setup host keys
so as to ssh to the remote host. Which goes to say that the following
Expand Down Expand Up @@ -53,6 +60,9 @@ def __init__(self, hostname, username=None, password=None, script_dir=None, envs
Raises:
'''
if not _ssh_enabled:
raise OptionalModuleMissing(['ssh'],
"SSHChannel requires the ssh module and config.")

self.hostname = hostname
self.username = username
Expand Down
Loading

0 comments on commit 1f16dcb

Please sign in to comment.