Merge recent master and desc dev

Parsl · Oct 13, 2023 · c851cf0 · c851cf0
2 parents f646540 + 6b963ba
commit c851cf0
Show file tree

Hide file tree

Showing 63 changed files with 1,532 additions and 1,343 deletions.
diff --git a/.github/workflows/python-publish-to-testpypi.yml b/.github/workflows/python-publish-to-testpypi.yml
@@ -32,6 +32,12 @@ jobs:
 
     steps:
     - uses: actions/checkout@v3
+
+    - name: Check if this commit is already released
+      id: already_released
+      run: |
+        if git tag --contains HEAD | grep -e '^[0-9]\{4\}\.[0-9]\{2\}\.[0-9]\{2\}$' ; then exit 1 ; fi
+
     - name: Set up Python
       uses: actions/setup-python@v3
       with:

diff --git a/.wci.yml b/.wci.yml
@@ -34,6 +34,7 @@ execution_environment:
     - LSF
     - PBS
     - Cobalt
+    - Flux
     - GridEngine
     - HTCondor
     - AWS

diff --git a/docs/devguide/roadmap.rst b/docs/devguide/roadmap.rst
@@ -1,135 +1,51 @@
-Historical: Roadmap
-===================
+Roadmap
+=======
 
-.. note::
-   This roadmap has not been current since version 0.9.0 in 2019, and does
-   not reflect changes in project direction then. For this reason, this
-   roadmap is marked as historical.
+**OVERVIEW**
 
-Before diving into the roadmap, a quick retrospective look at the evolution of workflow
-solutions that came before Parsl from the workflows group at UChicago and Argonne National Laboratory.
+While we follow best practices in software development processes (e.g., CI, flake8, code review), there are opportunities to make our code more maintainable and   accessible. This roadmap, written in the fall of 2023, covers our major activities planned through 2025 to increase efficiency, productivity, user experience, and community building.
 
-.. image:: ../images/swift-e-timeline_trimmed.png
-
-
-Sufficient capabilities to use Parsl in many common situations already exist.  This document indicates where Parsl is going;
-it contains a list of features that Parsl has or will have.  Features that exist today are marked in bold, with the release
-in which they were added marked for releases since 0.3.0. Help in providing any of the yet-to-be-developed capabilities is welcome.
-
-Features in preparation are documented via Github
+Features and improvements are documented via GitHub
 `issues <https://github.com/Parsl/parsl/issues>`_ and `pull requests <https://github.com/Parsl/parsl/pulls>`_.
 
 
-Core Functionality
----------------------
-
-* **Parsl has the ability to execute standard python code and to asynchronously execute tasks, called Apps.**
-    * **Any Python function annotated with "@App" is an App.**
-    * **Apps can be Python functions or bash scripts that wrap external applications.**
-* **Asynchronous tasks return futures, which other tasks can use as inputs.**
-    * **This builds an implicit data flow graph.**
-* **Asynchronous tasks can execute locally on threads or as separate processes.**
-* **Asynchronous tasks can execute on a remote resource.**
-    * **libsubmit (to be renamed) provides this functionality.**
-    * **A shared filesystem is assumed; data staging (of files) is not yet supported.**
-* **The Data Flow Kernel (DFK) schedules Parsl task execution (based on dataflow).**
-* **Class-based config definition (v0.6.0)**
-* **Singleton config, and separate DFK from app definitions (v0.6.0)**
-* Class-based app definition
-
-Data management
----------------
-
-* **File abstraction to support representation of local and remote files.**
-* **Support for a variety of common data access protocols (e.g., FTP, HTTP, Globus) (v0.6.0)**.
-* **Input/output staging models that support transparent movement of data from source to a location on which it is accessible for compute. This includes staging to/from the client (script execution location) and worker node (v0.6.0)**.
-* Support for creation of a sandbox and execution within the sandbox.
-* Multi-site support including transparent movement between sites.
-* **Support for systems without a shared file system (point-to-point staging). (Partial support in v0.9.0)**
-* Support for data caching at multiple levels and across sites.
-
-TODO: Add diagram for staging
-
-
-Execution core and parallelism (DFK)
-------------------------------------
-
-* **Support for application and data futures within scripts.**
-* **Internal (dynamically created/updated) task/data dependency graph that enables asynchronous execution ordered by data dependencies and throttled by resource limits.**
-* **Well-defined state transition model for task lifecycle. (v0.5.0)**
-* Add data staging to task state transition model.
-* **More efficient algorithms for managing dependency resolution. (v0.7.0)**
-* Scheduling and allocation algorithms that determine job placement based on job and data requirements (including deadlines) as well as site capabilities.
-* **Directing jobs to a specific set of sites.(v0.4.0)**
-* **Logic to manage (provision, resize) execution resource block based on job requirements, and running multiple tasks per resource block (v0.4.0).**
-* **Retry logic to support recovery and fault tolerance**
-* **Workflow level checkpointing and restart (v0.4.0)**
-* **Transition away from IPP to in-house executors (HighThroughputExecutor and ExtremeScaleExecutor v0.7.0)**
-
-Resource provisioning and execution
------------------------------------
-
-* **Uniform abstraction for execution resources (to support resource provisioning, job submission, allocation management) on cluster, cloud, and supercomputing resources**
-* **Support for different execution models on any execution provider (e.g., pilot jobs using Ipython parallel on clusters and extreme-scale execution using Swift/T on supercomputers)**
-    * **Slurm**
-    * **HTCondor**
-    * **Cobalt**
-    * **GridEngine**
-    * **PBS/Torque**
-    * **AWS**
-    * **GoogleCloud**
-    * **Azure**
-    * **Nova/OpenStack/Jetstream (partial support)**
-    * **Kubernetes (v0.6.0)**
-* **Support for launcher mechanisms**
-    * **srun**
-    * **aprun (Complete support 0.6.0)**
-    * **Various MPI launch mechanisms (Mpiexec, mpirun..)**
-* **Support for remote execution using SSH (from v0.3.0)and OAuth-based authentication (from v0.9.0)**
-* **Utilizing multiple sites for a single script’s execution (v0.4.0)**
-* Cloud-hosted site configuration repository that stores configurations for resource authentication, data staging, and job submission endpoints
-* **IPP workers to support multiple threads of execution per node. (v0.7.0 adds support via replacement executors)**
-* Smarter serialization with caching frequently used objects.
-* **Support for user-defined containers as Parsl apps and orchestration of workflows comprised of containers (v0.5.0)**
-    * **Docker (locally)**
-    * Shifter (NERSC, Blue Waters)
-    * Singularity (ALCF)
-
-Visualization, debugging, fault tolerance
------------------------------------------
-
-* **Support for exception handling**.
-* **Interface for accessing real-time state (v0.6.0)**.
-* **Visualization library that enables users to introspect graph, task, and data dependencies, as well as observe state of executed/executing tasks (from v0.9.0)**
-* Integration of visualization into jupyter
-* Support for visualizing dead/dying parts of the task graph and retrying with updates to the task.
-* **Retry model to selectively re-execute only the failed branches of a workflow graph**
-* **Fault tolerance support for individual task execution**
-* **Support for saving monitoring information to local DB (sqlite) and remote DB (elasticsearch) (v0.6.0 and v0.7.0)**
-
-Authentication and authorization
---------------------------------
-
-* **Seamless authentication using OAuth-based methods within Parsl scripts (e.g., native app grants) (v0.6.0)**
-* Support for arbitrary identity providers and pass through to execution resources
-* Support for transparent/scoped access to external services **(e.g., Globus transfer) (v0.6.0)**
-
-Ecosystem
----------
-
-* Support for CWL, ability to execute CWL workflows and use CWL app descriptions
-* Creation of library of Parsl apps and workflows
-* Provenance capture/export in standard formats
-* Automatic metrics capture and reporting to understand Parsl usage
-* **Anonymous Usage Tracking (v0.4.0)**
-
-Documentation / Tutorials:
---------------------------
-
-* **Documentation about Parsl and its features**
-* **Documentation about supported sites (v0.6.0)**
-* **Self-guided Jupyter notebook tutorials on Parsl features**
-* **Hands-on tutorial suitable for webinars and meetings**
-
-
-
+Code Maintenance
+----------------
+
+* **Type Annotations and Static Type Checking**: Add static type annotations throughout the codebase and add typeguard checks.
+* **Release Process**: `Improve the overall release process <https://github.com/Parsl/parsl/issues?q=is%3Aopen+is%3Aissue+label%3Arelease_process>`_ to synchronize docs and code releases, automatically produce changelog documentation.
+* **Components Maturity Model**: Defines the `component maturity model <https://github.com/Parsl/parsl/issues/2554>`_ and tags components with their appropriate maturity level.
+* **Define and Document Interfaces**: Identify and document interfaces via which `external components <https://parsl.readthedocs.io/en/stable/userguide/plugins.html>`_ can augment the Parsl ecosystem.
+* **Distributed Testing Process**: All tests should be run against all possible schedulers, using different executors, on a variety of remote systems. Explore the use of containerized schedulers and remote testing on real systems.
+
+New Features and Integrations
+-----------------------------
+
+* **Enhanced MPI Support**: Extend Parsl’s MPI model with MPI apps and runtime support capable of running MPI apps in different environments (MPI flavor and launcher).
+* **Serialization Configuration**: Enable users to select what serialization methods are used and enable users to supply their own serializer.
+* **PSI/J integration**: Integrate PSI/J as a common interface for schedulers.
+* **Internal Concurrency Model**: Revisit and rearchitect the concurrency model to reduce areas that are not well understood and reduce the likelihood of errors.
+* **Common Model for Errors**: Make Parsl errors self-describing and understandable by users.
+* **Plug-in Model for External Components**: Extend Parsl to implement interfaces defined above. 
+* **User Configuration Validation Tool**: Provide tooling to help users configure Parsl and diagnose and resolve errors.
+* **Anonymized Usage Tracking**: Usage tracking is crucial for our data-oriented approach to understand the adoption of Parsl, which components are used, and where errors occur. This allows us to prioritize investment in components, progress components through the maturity levels, and identify bugs. Revisit prior usage tracking and develop a service that enables users to control tracking information.
+* **Support for Globus Compute**: Enable execution of Parsl tasks using Globus Compute as an executor.
+* **Update Globus Data Management**: Update Globus integration to use the new Globus Connect v5 model (i.e., needing specific scopes for individual endpoints).
+* **Performance Measurement**: Improve ability to measure performance metrics and report to users.
+* **Enhanced Debugging**: Application-level `logging <https://github.com/Parsl/parsl/issues/1984>`_ to understand app execution. 
+
+Tutorials, Training, and User Support
+-------------------------------------
+
+* **Configuration and Debugging**: Tutorials showing how to configure Parsl for different resources and debug execution. 
+* **Functional Serialization 101**: Tutorial describing how serialization works and how you can integrate custom serializers. 
+* **ProxyStore Data Management**: Tutorial showing how you can use ProxyStore to manage data for both inter and intra-site scenarios.
+* **Open Dev Calls on Zoom**: The internal core team holds an open dev call/office hours every other Thursday to help users troubleshoot issues, present and share their work, connect with each other, and provide community updates.
+* **Project Documentation**: is maintained and updated in `Read the Docs <https://parsl.readthedocs.io/en/stable/index.html>`_.
+
+Longer-term Objectives
+----------------------
+
+* **Globus Compute Integration**: Once Globus Compute supports multi-tenancy, Parsl will be able to use it to run remote tasks on initially one and then later multiple resources.
+* **Multi-System Optimization**: Once Globus Compute integration is complete, it is best to use multiple systems for multiple tasks as part of a single workflow.
+* **HPC Checkpointing and Job Migration**: As new resources become available, HPC tasks will be able to be checkpointed and moved to the system with more resources.
diff --git a/docs/userguide/configuring.rst b/docs/userguide/configuring.rst
@@ -17,8 +17,8 @@ supercomputer at TACC.
 This config uses the `parsl.executors.HighThroughputExecutor` to submit
 tasks from a login node (`parsl.channels.LocalChannel`). It requests an allocation of
 128 nodes, deploying 1 worker for each of the 56 cores per node, from the normal partition.
-The config uses the `address_by_hostname()` helper function to determine
-the login node's IP address.
+To limit network connections to just the internal network the config specifies the address
+used by the infiniband interface with ``address_by_interface('ib0')``
 
 .. code-block:: python
 
@@ -27,13 +27,13 @@ the login node's IP address.
     from parsl.providers import SlurmProvider
     from parsl.executors import HighThroughputExecutor
     from parsl.launchers import SrunLauncher
-    from parsl.addresses import address_by_hostname
+    from parsl.addresses import address_by_interface
 
     config = Config(
         executors=[
             HighThroughputExecutor(
                 label="frontera_htex",
-                address=address_by_hostname(),
+                address=address_by_interface('ib0'),
                 max_workers=56,
                 provider=SlurmProvider(
                     channel=LocalChannel(),

diff --git a/docs/userguide/index.rst b/docs/userguide/index.rst
@@ -19,4 +19,5 @@ User guide
    joins
    usage_tracking
    plugins
+   parsl_perf
    performance
diff --git a/docs/userguide/parsl_perf.rst b/docs/userguide/parsl_perf.rst
@@ -0,0 +1,53 @@
+.. _label-parsl-perf:
+
+Measuring performance with parsl-perf
+=====================================
+
+``parsl-perf`` is tool for making basic performance measurements of Parsl
+configurations.
+
+It runs increasingly large numbers of no-op apps until a batch takes
+(by default) 120 seconds, giving a measurement of tasks per second.
+
+This can give a basic measurement of some of the overheads in task
+execution.
+
+``parsl-perf`` must be invoked with a configuration file, which is a Python
+file containing a variable ``config`` which contains a `Config` object, or
+a function ``fresh_config`` which returns a `Config` object. The
+``fresh_config`` format is the same as used with the pytest test suite.
+
+To specify a ``parsl_resource_specification`` for tasks, add a ``--resources``
+argument.
+
+To change the target runtime from the default of 120 seconds, add a
+``--time`` parameter.
+
+For example:
+
+.. code-block:: bash
+
+
+    $ python -m parsl.benchmark.perf --config parsl/tests/configs/workqueue_ex.py --resources '{"cores":1, "memory":0, "disk":0}'
+    ==== Iteration 1 ====
+    Will run 10 tasks to target 120 seconds runtime
+    Submitting tasks / invoking apps
+    warning: using plain-text when communicating with workers.
+    warning: use encryption with a key and cert when creating the manager.
+    All 10 tasks submitted ... waiting for completion
+    Submission took 0.008 seconds = 1248.676 tasks/second
+    Runtime: actual 3.668s vs target 120s
+    Tasks per second: 2.726
+
+    [...]
+
+    ==== Iteration 4 ====
+    Will run 57640 tasks to target 120 seconds runtime
+    Submitting tasks / invoking apps
+    All 57640 tasks submitted ... waiting for completion
+    Submission took 34.839 seconds = 1654.487 tasks/second
+    Runtime: actual 364.387s vs target 120s
+    Tasks per second: 158.184
+    Cleaning up DFK
+    The end
+    
diff --git a/mypy.ini b/mypy.ini
@@ -6,6 +6,7 @@ plugins = sqlalchemy.ext.mypy.plugin
 #                  which is commonly done with manager IDs in the parsl
 #                  codebase.
 disable_error_code = str-bytes-safe
+enable_error_code = ignore-without-code
 no_implicit_reexport = True
 warn_redundant_casts = True
 
@@ -32,11 +33,6 @@ disallow_untyped_defs = True
 disallow_any_expr = True
 disallow_any_decorated = True
 
-[mypy-parsl.dataflow.executor_status.*]
-disallow_untyped_defs = True
-disallow_any_expr = True
-disallow_any_decorated = True
-
 [mypy-parsl.dataflow.futures.*]
 disallow_untyped_defs = True
 disallow_any_decorated = True

diff --git a/parsl/__init__.py b/parsl/__init__.py
@@ -63,7 +63,7 @@ def lazy_loader(name):
 
 
 # parsl/__init__.py:61: error: Cannot assign to a method
-parsl.__getattr__ = lazy_loader  # type: ignore
+parsl.__getattr__ = lazy_loader  # type: ignore[method-assign]
 
 import multiprocessing as _multiprocessing
 if platform.system() == 'Darwin':

diff --git a/parsl/addresses.py b/parsl/addresses.py
@@ -13,7 +13,7 @@
 try:
     import fcntl
 except ImportError:
-    fcntl = None  # type: ignore
+    fcntl = None  # type: ignore[assignment]
 import struct
 import typeguard
 import psutil
@@ -110,15 +110,15 @@ def get_all_addresses() -> Set[str]:
         try:
             s_addresses.add(address_by_interface(interface))
         except Exception:
-            logger.exception("Ignoring failure to fetch address from interface {}".format(interface))
+            logger.info("Ignoring failure to fetch address from interface {}".format(interface))
 
     resolution_functions: List[Callable[[], str]]
     resolution_functions = [address_by_hostname, address_by_route, address_by_query]
     for f in resolution_functions:
         try:
             s_addresses.add(f())
         except Exception:
-            logger.exception("Ignoring an address finder exception")
+            logger.info("Ignoring an address finder exception")
 
     return s_addresses
 
@@ -137,7 +137,7 @@ def get_any_address() -> str:
             addr = address_by_interface(interface)
             return addr
         except Exception:
-            logger.exception("Ignoring failure to fetch address from interface {}".format(interface))
+            logger.info("Ignoring failure to fetch address from interface {}".format(interface))
 
     resolution_functions: List[Callable[[], str]]
     resolution_functions = [address_by_hostname, address_by_route, address_by_query]
@@ -146,7 +146,7 @@ def get_any_address() -> str:
             addr = f()
             return addr
         except Exception:
-            logger.exception("Ignoring an address finder exception")
+            logger.info("Ignoring an address finder exception")
 
     if addr == '':
         raise Exception('Cannot find address of the local machine.')
-Original file line number
+Diff line change
@@ Expand Up / @@ -34,6 +34,7 @@ execution_environment: @@
         - LSF
         - PBS
         - Cobalt
+        - Flux
         - GridEngine
         - HTCondor
         - AWS
@@ Expand Down @@