Skip to content

Commit

Permalink
Merge recent master and desc dev
Browse files Browse the repository at this point in the history
  • Loading branch information
benclifford committed Oct 13, 2023
2 parents f646540 + 6b963ba commit c851cf0
Show file tree
Hide file tree
Showing 63 changed files with 1,532 additions and 1,343 deletions.
6 changes: 6 additions & 0 deletions .github/workflows/python-publish-to-testpypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ jobs:

steps:
- uses: actions/checkout@v3

- name: Check if this commit is already released
id: already_released
run: |
if git tag --contains HEAD | grep -e '^[0-9]\{4\}\.[0-9]\{2\}\.[0-9]\{2\}$' ; then exit 1 ; fi
- name: Set up Python
uses: actions/setup-python@v3
with:
Expand Down
1 change: 1 addition & 0 deletions .wci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ execution_environment:
- LSF
- PBS
- Cobalt
- Flux
- GridEngine
- HTCondor
- AWS
Expand Down
174 changes: 45 additions & 129 deletions docs/devguide/roadmap.rst
Original file line number Diff line number Diff line change
@@ -1,135 +1,51 @@
Historical: Roadmap
===================
Roadmap
=======

.. note::
This roadmap has not been current since version 0.9.0 in 2019, and does
not reflect changes in project direction then. For this reason, this
roadmap is marked as historical.
**OVERVIEW**

Before diving into the roadmap, a quick retrospective look at the evolution of workflow
solutions that came before Parsl from the workflows group at UChicago and Argonne National Laboratory.
While we follow best practices in software development processes (e.g., CI, flake8, code review), there are opportunities to make our code more maintainable and accessible. This roadmap, written in the fall of 2023, covers our major activities planned through 2025 to increase efficiency, productivity, user experience, and community building.

.. image:: ../images/swift-e-timeline_trimmed.png


Sufficient capabilities to use Parsl in many common situations already exist. This document indicates where Parsl is going;
it contains a list of features that Parsl has or will have. Features that exist today are marked in bold, with the release
in which they were added marked for releases since 0.3.0. Help in providing any of the yet-to-be-developed capabilities is welcome.

Features in preparation are documented via Github
Features and improvements are documented via GitHub
`issues <https://github.com/Parsl/parsl/issues>`_ and `pull requests <https://github.com/Parsl/parsl/pulls>`_.


Core Functionality
---------------------

* **Parsl has the ability to execute standard python code and to asynchronously execute tasks, called Apps.**
* **Any Python function annotated with "@App" is an App.**
* **Apps can be Python functions or bash scripts that wrap external applications.**
* **Asynchronous tasks return futures, which other tasks can use as inputs.**
* **This builds an implicit data flow graph.**
* **Asynchronous tasks can execute locally on threads or as separate processes.**
* **Asynchronous tasks can execute on a remote resource.**
* **libsubmit (to be renamed) provides this functionality.**
* **A shared filesystem is assumed; data staging (of files) is not yet supported.**
* **The Data Flow Kernel (DFK) schedules Parsl task execution (based on dataflow).**
* **Class-based config definition (v0.6.0)**
* **Singleton config, and separate DFK from app definitions (v0.6.0)**
* Class-based app definition

Data management
---------------

* **File abstraction to support representation of local and remote files.**
* **Support for a variety of common data access protocols (e.g., FTP, HTTP, Globus) (v0.6.0)**.
* **Input/output staging models that support transparent movement of data from source to a location on which it is accessible for compute. This includes staging to/from the client (script execution location) and worker node (v0.6.0)**.
* Support for creation of a sandbox and execution within the sandbox.
* Multi-site support including transparent movement between sites.
* **Support for systems without a shared file system (point-to-point staging). (Partial support in v0.9.0)**
* Support for data caching at multiple levels and across sites.

TODO: Add diagram for staging


Execution core and parallelism (DFK)
------------------------------------

* **Support for application and data futures within scripts.**
* **Internal (dynamically created/updated) task/data dependency graph that enables asynchronous execution ordered by data dependencies and throttled by resource limits.**
* **Well-defined state transition model for task lifecycle. (v0.5.0)**
* Add data staging to task state transition model.
* **More efficient algorithms for managing dependency resolution. (v0.7.0)**
* Scheduling and allocation algorithms that determine job placement based on job and data requirements (including deadlines) as well as site capabilities.
* **Directing jobs to a specific set of sites.(v0.4.0)**
* **Logic to manage (provision, resize) execution resource block based on job requirements, and running multiple tasks per resource block (v0.4.0).**
* **Retry logic to support recovery and fault tolerance**
* **Workflow level checkpointing and restart (v0.4.0)**
* **Transition away from IPP to in-house executors (HighThroughputExecutor and ExtremeScaleExecutor v0.7.0)**

Resource provisioning and execution
-----------------------------------

* **Uniform abstraction for execution resources (to support resource provisioning, job submission, allocation management) on cluster, cloud, and supercomputing resources**
* **Support for different execution models on any execution provider (e.g., pilot jobs using Ipython parallel on clusters and extreme-scale execution using Swift/T on supercomputers)**
* **Slurm**
* **HTCondor**
* **Cobalt**
* **GridEngine**
* **PBS/Torque**
* **AWS**
* **GoogleCloud**
* **Azure**
* **Nova/OpenStack/Jetstream (partial support)**
* **Kubernetes (v0.6.0)**
* **Support for launcher mechanisms**
* **srun**
* **aprun (Complete support 0.6.0)**
* **Various MPI launch mechanisms (Mpiexec, mpirun..)**
* **Support for remote execution using SSH (from v0.3.0)and OAuth-based authentication (from v0.9.0)**
* **Utilizing multiple sites for a single script’s execution (v0.4.0)**
* Cloud-hosted site configuration repository that stores configurations for resource authentication, data staging, and job submission endpoints
* **IPP workers to support multiple threads of execution per node. (v0.7.0 adds support via replacement executors)**
* Smarter serialization with caching frequently used objects.
* **Support for user-defined containers as Parsl apps and orchestration of workflows comprised of containers (v0.5.0)**
* **Docker (locally)**
* Shifter (NERSC, Blue Waters)
* Singularity (ALCF)

Visualization, debugging, fault tolerance
-----------------------------------------

* **Support for exception handling**.
* **Interface for accessing real-time state (v0.6.0)**.
* **Visualization library that enables users to introspect graph, task, and data dependencies, as well as observe state of executed/executing tasks (from v0.9.0)**
* Integration of visualization into jupyter
* Support for visualizing dead/dying parts of the task graph and retrying with updates to the task.
* **Retry model to selectively re-execute only the failed branches of a workflow graph**
* **Fault tolerance support for individual task execution**
* **Support for saving monitoring information to local DB (sqlite) and remote DB (elasticsearch) (v0.6.0 and v0.7.0)**

Authentication and authorization
--------------------------------

* **Seamless authentication using OAuth-based methods within Parsl scripts (e.g., native app grants) (v0.6.0)**
* Support for arbitrary identity providers and pass through to execution resources
* Support for transparent/scoped access to external services **(e.g., Globus transfer) (v0.6.0)**

Ecosystem
---------

* Support for CWL, ability to execute CWL workflows and use CWL app descriptions
* Creation of library of Parsl apps and workflows
* Provenance capture/export in standard formats
* Automatic metrics capture and reporting to understand Parsl usage
* **Anonymous Usage Tracking (v0.4.0)**

Documentation / Tutorials:
--------------------------

* **Documentation about Parsl and its features**
* **Documentation about supported sites (v0.6.0)**
* **Self-guided Jupyter notebook tutorials on Parsl features**
* **Hands-on tutorial suitable for webinars and meetings**



Code Maintenance
----------------

* **Type Annotations and Static Type Checking**: Add static type annotations throughout the codebase and add typeguard checks.
* **Release Process**: `Improve the overall release process <https://github.com/Parsl/parsl/issues?q=is%3Aopen+is%3Aissue+label%3Arelease_process>`_ to synchronize docs and code releases, automatically produce changelog documentation.
* **Components Maturity Model**: Defines the `component maturity model <https://github.com/Parsl/parsl/issues/2554>`_ and tags components with their appropriate maturity level.
* **Define and Document Interfaces**: Identify and document interfaces via which `external components <https://parsl.readthedocs.io/en/stable/userguide/plugins.html>`_ can augment the Parsl ecosystem.
* **Distributed Testing Process**: All tests should be run against all possible schedulers, using different executors, on a variety of remote systems. Explore the use of containerized schedulers and remote testing on real systems.

New Features and Integrations
-----------------------------

* **Enhanced MPI Support**: Extend Parsl’s MPI model with MPI apps and runtime support capable of running MPI apps in different environments (MPI flavor and launcher).
* **Serialization Configuration**: Enable users to select what serialization methods are used and enable users to supply their own serializer.
* **PSI/J integration**: Integrate PSI/J as a common interface for schedulers.
* **Internal Concurrency Model**: Revisit and rearchitect the concurrency model to reduce areas that are not well understood and reduce the likelihood of errors.
* **Common Model for Errors**: Make Parsl errors self-describing and understandable by users.
* **Plug-in Model for External Components**: Extend Parsl to implement interfaces defined above.
* **User Configuration Validation Tool**: Provide tooling to help users configure Parsl and diagnose and resolve errors.
* **Anonymized Usage Tracking**: Usage tracking is crucial for our data-oriented approach to understand the adoption of Parsl, which components are used, and where errors occur. This allows us to prioritize investment in components, progress components through the maturity levels, and identify bugs. Revisit prior usage tracking and develop a service that enables users to control tracking information.
* **Support for Globus Compute**: Enable execution of Parsl tasks using Globus Compute as an executor.
* **Update Globus Data Management**: Update Globus integration to use the new Globus Connect v5 model (i.e., needing specific scopes for individual endpoints).
* **Performance Measurement**: Improve ability to measure performance metrics and report to users.
* **Enhanced Debugging**: Application-level `logging <https://github.com/Parsl/parsl/issues/1984>`_ to understand app execution.

Tutorials, Training, and User Support
-------------------------------------

* **Configuration and Debugging**: Tutorials showing how to configure Parsl for different resources and debug execution.
* **Functional Serialization 101**: Tutorial describing how serialization works and how you can integrate custom serializers.
* **ProxyStore Data Management**: Tutorial showing how you can use ProxyStore to manage data for both inter and intra-site scenarios.
* **Open Dev Calls on Zoom**: The internal core team holds an open dev call/office hours every other Thursday to help users troubleshoot issues, present and share their work, connect with each other, and provide community updates.
* **Project Documentation**: is maintained and updated in `Read the Docs <https://parsl.readthedocs.io/en/stable/index.html>`_.

Longer-term Objectives
----------------------

* **Globus Compute Integration**: Once Globus Compute supports multi-tenancy, Parsl will be able to use it to run remote tasks on initially one and then later multiple resources.
* **Multi-System Optimization**: Once Globus Compute integration is complete, it is best to use multiple systems for multiple tasks as part of a single workflow.
* **HPC Checkpointing and Job Migration**: As new resources become available, HPC tasks will be able to be checkpointed and moved to the system with more resources.
8 changes: 4 additions & 4 deletions docs/userguide/configuring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ supercomputer at TACC.
This config uses the `parsl.executors.HighThroughputExecutor` to submit
tasks from a login node (`parsl.channels.LocalChannel`). It requests an allocation of
128 nodes, deploying 1 worker for each of the 56 cores per node, from the normal partition.
The config uses the `address_by_hostname()` helper function to determine
the login node's IP address.
To limit network connections to just the internal network the config specifies the address
used by the infiniband interface with ``address_by_interface('ib0')``

.. code-block:: python
Expand All @@ -27,13 +27,13 @@ the login node's IP address.
from parsl.providers import SlurmProvider
from parsl.executors import HighThroughputExecutor
from parsl.launchers import SrunLauncher
from parsl.addresses import address_by_hostname
from parsl.addresses import address_by_interface
config = Config(
executors=[
HighThroughputExecutor(
label="frontera_htex",
address=address_by_hostname(),
address=address_by_interface('ib0'),
max_workers=56,
provider=SlurmProvider(
channel=LocalChannel(),
Expand Down
1 change: 1 addition & 0 deletions docs/userguide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,5 @@ User guide
joins
usage_tracking
plugins
parsl_perf
performance
53 changes: 53 additions & 0 deletions docs/userguide/parsl_perf.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
.. _label-parsl-perf:

Measuring performance with parsl-perf
=====================================

``parsl-perf`` is tool for making basic performance measurements of Parsl
configurations.

It runs increasingly large numbers of no-op apps until a batch takes
(by default) 120 seconds, giving a measurement of tasks per second.

This can give a basic measurement of some of the overheads in task
execution.

``parsl-perf`` must be invoked with a configuration file, which is a Python
file containing a variable ``config`` which contains a `Config` object, or
a function ``fresh_config`` which returns a `Config` object. The
``fresh_config`` format is the same as used with the pytest test suite.

To specify a ``parsl_resource_specification`` for tasks, add a ``--resources``
argument.

To change the target runtime from the default of 120 seconds, add a
``--time`` parameter.

For example:

.. code-block:: bash
$ python -m parsl.benchmark.perf --config parsl/tests/configs/workqueue_ex.py --resources '{"cores":1, "memory":0, "disk":0}'
==== Iteration 1 ====
Will run 10 tasks to target 120 seconds runtime
Submitting tasks / invoking apps
warning: using plain-text when communicating with workers.
warning: use encryption with a key and cert when creating the manager.
All 10 tasks submitted ... waiting for completion
Submission took 0.008 seconds = 1248.676 tasks/second
Runtime: actual 3.668s vs target 120s
Tasks per second: 2.726
[...]
==== Iteration 4 ====
Will run 57640 tasks to target 120 seconds runtime
Submitting tasks / invoking apps
All 57640 tasks submitted ... waiting for completion
Submission took 34.839 seconds = 1654.487 tasks/second
Runtime: actual 364.387s vs target 120s
Tasks per second: 158.184
Cleaning up DFK
The end
6 changes: 1 addition & 5 deletions mypy.ini
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ plugins = sqlalchemy.ext.mypy.plugin
# which is commonly done with manager IDs in the parsl
# codebase.
disable_error_code = str-bytes-safe
enable_error_code = ignore-without-code
no_implicit_reexport = True
warn_redundant_casts = True

Expand All @@ -32,11 +33,6 @@ disallow_untyped_defs = True
disallow_any_expr = True
disallow_any_decorated = True

[mypy-parsl.dataflow.executor_status.*]
disallow_untyped_defs = True
disallow_any_expr = True
disallow_any_decorated = True

[mypy-parsl.dataflow.futures.*]
disallow_untyped_defs = True
disallow_any_decorated = True
Expand Down
2 changes: 1 addition & 1 deletion parsl/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def lazy_loader(name):


# parsl/__init__.py:61: error: Cannot assign to a method
parsl.__getattr__ = lazy_loader # type: ignore
parsl.__getattr__ = lazy_loader # type: ignore[method-assign]

import multiprocessing as _multiprocessing
if platform.system() == 'Darwin':
Expand Down
10 changes: 5 additions & 5 deletions parsl/addresses.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
try:
import fcntl
except ImportError:
fcntl = None # type: ignore
fcntl = None # type: ignore[assignment]
import struct
import typeguard
import psutil
Expand Down Expand Up @@ -110,15 +110,15 @@ def get_all_addresses() -> Set[str]:
try:
s_addresses.add(address_by_interface(interface))
except Exception:
logger.exception("Ignoring failure to fetch address from interface {}".format(interface))
logger.info("Ignoring failure to fetch address from interface {}".format(interface))

resolution_functions: List[Callable[[], str]]
resolution_functions = [address_by_hostname, address_by_route, address_by_query]
for f in resolution_functions:
try:
s_addresses.add(f())
except Exception:
logger.exception("Ignoring an address finder exception")
logger.info("Ignoring an address finder exception")

return s_addresses

Expand All @@ -137,7 +137,7 @@ def get_any_address() -> str:
addr = address_by_interface(interface)
return addr
except Exception:
logger.exception("Ignoring failure to fetch address from interface {}".format(interface))
logger.info("Ignoring failure to fetch address from interface {}".format(interface))

resolution_functions: List[Callable[[], str]]
resolution_functions = [address_by_hostname, address_by_route, address_by_query]
Expand All @@ -146,7 +146,7 @@ def get_any_address() -> str:
addr = f()
return addr
except Exception:
logger.exception("Ignoring an address finder exception")
logger.info("Ignoring an address finder exception")

if addr == '':
raise Exception('Cannot find address of the local machine.')
Expand Down
Loading

0 comments on commit c851cf0

Please sign in to comment.