Skip to content

Commit

Permalink
Merge branch 'master' into use_sacct_or_squeue
Browse files Browse the repository at this point in the history
  • Loading branch information
benclifford authored Aug 19, 2024
2 parents fe3cd20 + 123df51 commit bbc169b
Show file tree
Hide file tree
Showing 107 changed files with 1,744 additions and 1,317 deletions.
42 changes: 42 additions & 0 deletions .github/workflows/parsl+flux.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: Test Flux Scheduler
on:
pull_request: []

jobs:
build:
runs-on: ubuntu-22.04
permissions:
packages: read
strategy:
fail-fast: false
matrix:
container: ['fluxrm/flux-sched:jammy']
timeout-minutes: 30

container:
image: ${{ matrix.container }}
options: "--platform=linux/amd64 --user root -it --init"

name: ${{ matrix.container }}
steps:
- name: Checkout
uses: actions/checkout@v3

- name: Install Dependencies and Parsl
run: |
apt-get update && apt-get install -y python3-pip curl
pip3 install . -r test-requirements.txt
- name: Verify Parsl Installation
run: |
pytest parsl/tests/ -k "not cleannet and not unix_filesystem_permissions_required" --config parsl/tests/configs/local_threads.py --random-order --durations 10
- name: Test Parsl with Flux
run: |
pytest parsl/tests/test_flux.py --config local --random-order
- name: Test Parsl with Flux Config
run: |
pytest parsl/tests/ -k "not cleannet and not unix_filesystem_permissions_required" --config parsl/tests/configs/flux_local.py --random-order --durations 10
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,8 @@ coverage: ## show the coverage report
.PHONY: clean
clean: ## clean up the environment by deleting the .venv, dist, eggs, mypy caches, coverage info, etc
rm -rf .venv $(DEPS) dist *.egg-info .mypy_cache build .pytest_cache .coverage runinfo $(WORKQUEUE_INSTALL)

.PHONY: flux_local_test
flux_local_test: ## Test Parsl with Flux Executor
pip3 install .
pytest parsl/tests/ -k "not cleannet" --config parsl/tests/configs/flux_local.py --random-order --durations 10
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ For Developers

3. Install::

$ cd parsl
$ cd parsl # only if you didn't enter the top-level directory in step 2 above
$ python3 setup.py install

4. Use Parsl!
Expand Down
1 change: 1 addition & 0 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Alternatively, you can configure the file logger to write to an output file.

.. code-block:: python
import logging
import parsl
# Emit log lines to the screen
Expand Down
8 changes: 4 additions & 4 deletions docs/historical/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ New Functionality
* New launcher: `parsl.launchers.WrappedLauncher` for launching tasks inside containers.

* `parsl.channels.SSHChannel` now supports a ``key_filename`` kwarg `issue#1639 <https://github.com/Parsl/parsl/issues/1639>`_
* ``parsl.channels.SSHChannel`` now supports a ``key_filename`` kwarg `issue#1639 <https://github.com/Parsl/parsl/issues/1639>`_

* Newly added Makefile wraps several frequent developer operations such as:

Expand Down Expand Up @@ -442,7 +442,7 @@ New Functionality
module, parsl.data_provider.globus

* `parsl.executors.WorkQueueExecutor`: a new executor that integrates functionality from `Work Queue <http://ccl.cse.nd.edu/software/workqueue/>`_ is now available.
* New provider to support for Ad-Hoc clusters `parsl.providers.AdHocProvider`
* New provider to support for Ad-Hoc clusters ``parsl.providers.AdHocProvider``
* New provider added to support LSF on Summit `parsl.providers.LSFProvider`
* Support for CPU and Memory resource hints to providers `(github) <https://github.com/Parsl/parsl/issues/942>`_.
* The ``logging_level=logging.INFO`` in `parsl.monitoring.MonitoringHub` is replaced with ``monitoring_debug=False``:
Expand All @@ -468,7 +468,7 @@ New Functionality
* Several test-suite improvements that have dramatically reduced test duration.
* Several improvements to the Monitoring interface.
* Configurable port on `parsl.channels.SSHChannel`.
* Configurable port on ``parsl.channels.SSHChannel``.
* ``suppress_failure`` now defaults to True.
* `parsl.executors.HighThroughputExecutor` is the recommended executor, and ``IPyParallelExecutor`` is deprecated.
* `parsl.executors.HighThroughputExecutor` will expose worker information via environment variables: ``PARSL_WORKER_RANK`` and ``PARSL_WORKER_COUNT``
Expand Down Expand Up @@ -532,7 +532,7 @@ New Functionality
* Cleaner user app file log management.
* Updated configurations using `parsl.executors.HighThroughputExecutor` in the configuration section of the userguide.
* Support for OAuth based SSH with `parsl.channels.OAuthSSHChannel`.
* Support for OAuth based SSH with ``parsl.channels.OAuthSSHChannel``.

Bug Fixes
^^^^^^^^^
Expand Down
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ Parsl lets you chain functions together and will launch each function as inputs
return x + 1
@python_app
def g(x):
return x * 2
def g(x, y):
return x + y
# These functions now return Futures, and can be chained
future = f(1)
Expand Down
13 changes: 3 additions & 10 deletions docs/reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,9 @@ Configuration
Channels
========

.. autosummary::
:toctree: stubs
:nosignatures:

parsl.channels.base.Channel
parsl.channels.LocalChannel
parsl.channels.SSHChannel
parsl.channels.OAuthSSHChannel
parsl.channels.SSHInteractiveLoginChannel
Channels are deprecated in Parsl. See
`issue 3515 <https://github.com/Parsl/parsl/issues/3515>`_
for further discussion.

Data management
===============
Expand Down Expand Up @@ -109,7 +103,6 @@ Providers
:toctree: stubs
:nosignatures:

parsl.providers.AdHocProvider
parsl.providers.AWSProvider
parsl.providers.CobaltProvider
parsl.providers.CondorProvider
Expand Down
16 changes: 9 additions & 7 deletions docs/userguide/checkpoints.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,17 @@ during development. Using app caching will ensure that only modified apps are re
App equivalence
^^^^^^^^^^^^^^^

Parsl determines app equivalence by storing the hash
of the app function. Thus, any changes to the app code (e.g.,
its signature, its body, or even the docstring within the body)
will invalidate cached values.
Parsl determines app equivalence using the name of the app function:
if two apps have the same name, then they are equivalent under this
relation.

However, Parsl does not traverse the call graph of the app function,
so changes inside functions called by an app will not invalidate
Changes inside the app, or by functions called by an app will not invalidate
cached values.

There are lots of other ways functions might be compared for equivalence,
and `parsl.dataflow.memoization.id_for_memo` provides a hook to plug in
alternate application-specific implementations.


Invocation equivalence
^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -92,7 +94,7 @@ Attempting to cache apps invoked with other, non-hashable, data types will
lead to an exception at invocation.

In that case, mechanisms to hash new types can be registered by a program by
implementing the ``parsl.dataflow.memoization.id_for_memo`` function for
implementing the `parsl.dataflow.memoization.id_for_memo` function for
the new type.

Ignoring arguments
Expand Down
80 changes: 45 additions & 35 deletions docs/userguide/configuring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,14 @@ queues, durations, and data management options.
The following example shows a basic configuration object (:class:`~parsl.config.Config`) for the Frontera
supercomputer at TACC.
This config uses the `parsl.executors.HighThroughputExecutor` to submit
tasks from a login node (`parsl.channels.LocalChannel`). It requests an allocation of
tasks from a login node. It requests an allocation of
128 nodes, deploying 1 worker for each of the 56 cores per node, from the normal partition.
To limit network connections to just the internal network the config specifies the address
used by the infiniband interface with ``address_by_interface('ib0')``

.. code-block:: python
from parsl.config import Config
from parsl.channels import LocalChannel
from parsl.providers import SlurmProvider
from parsl.executors import HighThroughputExecutor
from parsl.launchers import SrunLauncher
Expand All @@ -36,7 +35,6 @@ used by the infiniband interface with ``address_by_interface('ib0')``
address=address_by_interface('ib0'),
max_workers_per_node=56,
provider=SlurmProvider(
channel=LocalChannel(),
nodes_per_block=128,
init_blocks=1,
partition='normal',
Expand Down Expand Up @@ -197,22 +195,6 @@ Stepping through the following question should help formulate a suitable configu
are on a **native Slurm** system like :ref:`configuring_nersc_cori`


4) Where will the main Parsl program run and how will it communicate with the apps?

+------------------------+--------------------------+---------------------------------------------------+
| Parsl program location | App execution target | Suitable channel |
+========================+==========================+===================================================+
| Laptop/Workstation | Laptop/Workstation | `parsl.channels.LocalChannel` |
+------------------------+--------------------------+---------------------------------------------------+
| Laptop/Workstation | Cloud Resources | No channel is needed |
+------------------------+--------------------------+---------------------------------------------------+
| Laptop/Workstation | Clusters with no 2FA | `parsl.channels.SSHChannel` |
+------------------------+--------------------------+---------------------------------------------------+
| Laptop/Workstation | Clusters with 2FA | `parsl.channels.SSHInteractiveLoginChannel` |
+------------------------+--------------------------+---------------------------------------------------+
| Login node | Cluster/Supercomputer | `parsl.channels.LocalChannel` |
+------------------------+--------------------------+---------------------------------------------------+

Heterogeneous Resources
-----------------------

Expand Down Expand Up @@ -324,9 +306,13 @@ and Work Queue does not require Python to run.
Accelerators
------------

Many modern clusters provide multiple accelerators per compute note, yet many applications are best suited to using a single accelerator per task.
Parsl supports pinning each worker to difference accelerators using ``available_accelerators`` option of the :class:`~parsl.executors.HighThroughputExecutor`.
Provide either the number of executors (Parsl will assume they are named in integers starting from zero) or a list of the names of the accelerators available on the node.
Many modern clusters provide multiple accelerators per compute note, yet many applications are best suited to using a
single accelerator per task. Parsl supports pinning each worker to different accelerators using
``available_accelerators`` option of the :class:`~parsl.executors.HighThroughputExecutor`. Provide either the number of
executors (Parsl will assume they are named in integers starting from zero) or a list of the names of the accelerators
available on the node. Parsl will limit the number of workers it launches to the number of accelerators specified,
in other words, you cannot have more workers per node than there are accelerators. By default, Parsl will launch
as many workers as the accelerators specified via ``available_accelerators``.

.. code-block:: python
Expand All @@ -337,7 +323,6 @@ Provide either the number of executors (Parsl will assume they are named in inte
worker_debug=True,
available_accelerators=2,
provider=LocalProvider(
channel=LocalChannel(),
init_blocks=1,
max_blocks=1,
),
Expand All @@ -346,7 +331,39 @@ Provide either the number of executors (Parsl will assume they are named in inte
strategy='none',
)
It is possible to bind multiple/specific accelerators to each worker by specifying a list of comma separated strings
each specifying accelerators. In the context of binding to NVIDIA GPUs, this works by setting ``CUDA_VISIBLE_DEVICES``
on each worker to a specific string in the list supplied to ``available_accelerators``.

Here's an example:

.. code-block:: python
# The following config is trimmed for clarity
local_config = Config(
executors=[
HighThroughputExecutor(
# Starts 2 workers per node, each bound to 2 GPUs
available_accelerators=["0,1", "2,3"],
# Start a single worker bound to all 4 GPUs
# available_accelerators=["0,1,2,3"]
)
],
)
GPU Oversubscription
""""""""""""""""""""

For hardware that uses Nvidia devices, Parsl allows for the oversubscription of workers to GPUS. This is intended to
make use of Nvidia's `Multi-Process Service (MPS) <https://docs.nvidia.com/deploy/mps/>`_ available on many of their
GPUs that allows users to run multiple concurrent processes on a single GPU. The user needs to set in the
``worker_init`` commands to start MPS on every node in the block (this is machine dependent). The
``available_accelerators`` option should then be set to the total number of GPU partitions run on a single node in the
block. For example, for a node with 4 Nvidia GPUs, to create 8 workers per GPU, set ``available_accelerators=32``.
GPUs will be assigned to workers in ascending order in contiguous blocks. In the example, workers 0-7 will be placed
on GPU 0, workers 8-15 on GPU 1, workers 16-23 on GPU 2, and workers 24-31 on GPU 3.

Multi-Threaded Applications
---------------------------

Expand All @@ -371,7 +388,6 @@ Select the best blocking strategy for processor's cache hierarchy (choose ``alte
worker_debug=True,
cpu_affinity='alternating',
provider=LocalProvider(
channel=LocalChannel(),
init_blocks=1,
max_blocks=1,
),
Expand Down Expand Up @@ -411,18 +427,12 @@ These include ``OMP_NUM_THREADS``, ``GOMP_COMP_AFFINITY``, and ``KMP_THREAD_AFFI
Ad-Hoc Clusters
---------------

Any collection of compute nodes without a scheduler can be considered an
ad-hoc cluster. Often these machines have a shared file system such as NFS or Lustre.
In order to use these resources with Parsl, they need to set-up for password-less SSH access.

To use these ssh-accessible collection of nodes as an ad-hoc cluster, we use
the `parsl.providers.AdHocProvider` with an `parsl.channels.SSHChannel` to each node. An example
configuration follows.
Parsl's support of ad-hoc clusters of compute nodes without a scheduler
is deprecated.

.. literalinclude:: ../../parsl/configs/ad_hoc.py

.. note::
Multiple blocks should not be assigned to each node when using the `parsl.executors.HighThroughputExecutor`
See
`issue #3515 <https://github.com/Parsl/parsl/issues/3515>`_
for further discussion.

Amazon Web Services
-------------------
Expand Down
5 changes: 1 addition & 4 deletions docs/userguide/examples/config.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
from parsl.channels import LocalChannel
from parsl.config import Config
from parsl.executors import HighThroughputExecutor
from parsl.providers import LocalProvider
Expand All @@ -8,9 +7,7 @@
HighThroughputExecutor(
label="htex_local",
cores_per_worker=1,
provider=LocalProvider(
channel=LocalChannel(),
),
provider=LocalProvider(),
)
],
)
3 changes: 1 addition & 2 deletions docs/userguide/execution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,7 @@ Parsl currently supports the following providers:
7. `parsl.providers.AWSProvider`: This provider allows you to provision and manage cloud nodes from Amazon Web Services.
8. `parsl.providers.GoogleCloudProvider`: This provider allows you to provision and manage cloud nodes from Google Cloud.
9. `parsl.providers.KubernetesProvider`: This provider allows you to provision and manage containers on a Kubernetes cluster.
10. `parsl.providers.AdHocProvider`: This provider allows you manage execution over a collection of nodes to form an ad-hoc cluster.
11. `parsl.providers.LSFProvider`: This provider allows you to schedule resources via IBM's LSF scheduler.
10. `parsl.providers.LSFProvider`: This provider allows you to schedule resources via IBM's LSF scheduler.



Expand Down
7 changes: 7 additions & 0 deletions docs/userguide/mpi_apps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,13 @@ An example for ALCF's Polaris supercomputer that will run 3 MPI tasks of 2 nodes
)
.. warning::
Please note that ``Provider`` options that specify per-task or per-node resources, for example,
``SlurmProvider(cores_per_node=N, ...)`` should not be used with :class:`~parsl.executors.high_throughput.MPIExecutor`.
Parsl primarily uses a pilot job model and assumptions from that context do not translate to the MPI context. For
more info refer to :
`github issue #3006 <https://github.com/Parsl/parsl/issues/3006>`_

Writing an MPI App
------------------

Expand Down
11 changes: 5 additions & 6 deletions docs/userguide/plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ executor to run code on the local submitting host, while another executor can
run the same code on a large supercomputer.


Providers, Launchers and Channels
---------------------------------
Providers and Launchers
-----------------------
Some executors are based on blocks of workers (for example the
`parsl.executors.HighThroughputExecutor`: the submit side requires a
batch system (eg slurm, kubernetes) to start worker processes, which then
Expand All @@ -34,10 +34,9 @@ add on any wrappers that are needed to launch the command (eg srun inside
slurm). Providers and launchers are usually paired together for a particular
system type.

A `Channel` allows the commands used to interact with an `ExecutionProvider` to be
executed on a remote system. The default channel executes commands on the
local system, but a few variants of an `parsl.channels.SSHChannel` are provided.

Parsl also has a deprecated ``Channel`` abstraction. See
`issue 3515 <https://github.com/Parsl/parsl/issues/3515>`_
for further discussion.

File staging
------------
Expand Down
6 changes: 4 additions & 2 deletions parsl/app/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,10 @@ def __init__(self, func: Callable,
self.kwargs['walltime'] = params['walltime'].default
if 'parsl_resource_specification' in params:
self.kwargs['parsl_resource_specification'] = params['parsl_resource_specification'].default
self.outputs = params['outputs'].default if 'outputs' in params else []
self.inputs = params['inputs'].default if 'inputs' in params else []
if 'outputs' in params:
self.kwargs['outputs'] = params['outputs'].default
if 'inputs' in params:
self.kwargs['inputs'] = params['inputs'].default

@abstractmethod
def __call__(self, *args: Any, **kwargs: Any) -> AppFuture:
Expand Down
Loading

0 comments on commit bbc169b

Please sign in to comment.