-
Notifications
You must be signed in to change notification settings - Fork 199
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Remove historical info and update roadmap (#2897)
Updated the roadmap using content/copy from our working Google Doc
- Loading branch information
1 parent
b32147f
commit 95231e2
Showing
1 changed file
with
45 additions
and
129 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,135 +1,51 @@ | ||
Historical: Roadmap | ||
=================== | ||
Roadmap | ||
======= | ||
|
||
.. note:: | ||
This roadmap has not been current since version 0.9.0 in 2019, and does | ||
not reflect changes in project direction then. For this reason, this | ||
roadmap is marked as historical. | ||
**OVERVIEW** | ||
|
||
Before diving into the roadmap, a quick retrospective look at the evolution of workflow | ||
solutions that came before Parsl from the workflows group at UChicago and Argonne National Laboratory. | ||
While we follow best practices in software development processes (e.g., CI, flake8, code review), there are opportunities to make our code more maintainable and accessible. This roadmap, written in the fall of 2023, covers our major activities planned through 2025 to increase efficiency, productivity, user experience, and community building. | ||
|
||
.. image:: ../images/swift-e-timeline_trimmed.png | ||
|
||
|
||
Sufficient capabilities to use Parsl in many common situations already exist. This document indicates where Parsl is going; | ||
it contains a list of features that Parsl has or will have. Features that exist today are marked in bold, with the release | ||
in which they were added marked for releases since 0.3.0. Help in providing any of the yet-to-be-developed capabilities is welcome. | ||
|
||
Features in preparation are documented via Github | ||
Features and improvements are documented via GitHub | ||
`issues <https://github.com/Parsl/parsl/issues>`_ and `pull requests <https://github.com/Parsl/parsl/pulls>`_. | ||
|
||
|
||
Core Functionality | ||
--------------------- | ||
|
||
* **Parsl has the ability to execute standard python code and to asynchronously execute tasks, called Apps.** | ||
* **Any Python function annotated with "@App" is an App.** | ||
* **Apps can be Python functions or bash scripts that wrap external applications.** | ||
* **Asynchronous tasks return futures, which other tasks can use as inputs.** | ||
* **This builds an implicit data flow graph.** | ||
* **Asynchronous tasks can execute locally on threads or as separate processes.** | ||
* **Asynchronous tasks can execute on a remote resource.** | ||
* **libsubmit (to be renamed) provides this functionality.** | ||
* **A shared filesystem is assumed; data staging (of files) is not yet supported.** | ||
* **The Data Flow Kernel (DFK) schedules Parsl task execution (based on dataflow).** | ||
* **Class-based config definition (v0.6.0)** | ||
* **Singleton config, and separate DFK from app definitions (v0.6.0)** | ||
* Class-based app definition | ||
|
||
Data management | ||
--------------- | ||
|
||
* **File abstraction to support representation of local and remote files.** | ||
* **Support for a variety of common data access protocols (e.g., FTP, HTTP, Globus) (v0.6.0)**. | ||
* **Input/output staging models that support transparent movement of data from source to a location on which it is accessible for compute. This includes staging to/from the client (script execution location) and worker node (v0.6.0)**. | ||
* Support for creation of a sandbox and execution within the sandbox. | ||
* Multi-site support including transparent movement between sites. | ||
* **Support for systems without a shared file system (point-to-point staging). (Partial support in v0.9.0)** | ||
* Support for data caching at multiple levels and across sites. | ||
|
||
TODO: Add diagram for staging | ||
|
||
|
||
Execution core and parallelism (DFK) | ||
------------------------------------ | ||
|
||
* **Support for application and data futures within scripts.** | ||
* **Internal (dynamically created/updated) task/data dependency graph that enables asynchronous execution ordered by data dependencies and throttled by resource limits.** | ||
* **Well-defined state transition model for task lifecycle. (v0.5.0)** | ||
* Add data staging to task state transition model. | ||
* **More efficient algorithms for managing dependency resolution. (v0.7.0)** | ||
* Scheduling and allocation algorithms that determine job placement based on job and data requirements (including deadlines) as well as site capabilities. | ||
* **Directing jobs to a specific set of sites.(v0.4.0)** | ||
* **Logic to manage (provision, resize) execution resource block based on job requirements, and running multiple tasks per resource block (v0.4.0).** | ||
* **Retry logic to support recovery and fault tolerance** | ||
* **Workflow level checkpointing and restart (v0.4.0)** | ||
* **Transition away from IPP to in-house executors (HighThroughputExecutor and ExtremeScaleExecutor v0.7.0)** | ||
|
||
Resource provisioning and execution | ||
----------------------------------- | ||
|
||
* **Uniform abstraction for execution resources (to support resource provisioning, job submission, allocation management) on cluster, cloud, and supercomputing resources** | ||
* **Support for different execution models on any execution provider (e.g., pilot jobs using Ipython parallel on clusters and extreme-scale execution using Swift/T on supercomputers)** | ||
* **Slurm** | ||
* **HTCondor** | ||
* **Cobalt** | ||
* **GridEngine** | ||
* **PBS/Torque** | ||
* **AWS** | ||
* **GoogleCloud** | ||
* **Azure** | ||
* **Nova/OpenStack/Jetstream (partial support)** | ||
* **Kubernetes (v0.6.0)** | ||
* **Support for launcher mechanisms** | ||
* **srun** | ||
* **aprun (Complete support 0.6.0)** | ||
* **Various MPI launch mechanisms (Mpiexec, mpirun..)** | ||
* **Support for remote execution using SSH (from v0.3.0)and OAuth-based authentication (from v0.9.0)** | ||
* **Utilizing multiple sites for a single script’s execution (v0.4.0)** | ||
* Cloud-hosted site configuration repository that stores configurations for resource authentication, data staging, and job submission endpoints | ||
* **IPP workers to support multiple threads of execution per node. (v0.7.0 adds support via replacement executors)** | ||
* Smarter serialization with caching frequently used objects. | ||
* **Support for user-defined containers as Parsl apps and orchestration of workflows comprised of containers (v0.5.0)** | ||
* **Docker (locally)** | ||
* Shifter (NERSC, Blue Waters) | ||
* Singularity (ALCF) | ||
|
||
Visualization, debugging, fault tolerance | ||
----------------------------------------- | ||
|
||
* **Support for exception handling**. | ||
* **Interface for accessing real-time state (v0.6.0)**. | ||
* **Visualization library that enables users to introspect graph, task, and data dependencies, as well as observe state of executed/executing tasks (from v0.9.0)** | ||
* Integration of visualization into jupyter | ||
* Support for visualizing dead/dying parts of the task graph and retrying with updates to the task. | ||
* **Retry model to selectively re-execute only the failed branches of a workflow graph** | ||
* **Fault tolerance support for individual task execution** | ||
* **Support for saving monitoring information to local DB (sqlite) and remote DB (elasticsearch) (v0.6.0 and v0.7.0)** | ||
|
||
Authentication and authorization | ||
-------------------------------- | ||
|
||
* **Seamless authentication using OAuth-based methods within Parsl scripts (e.g., native app grants) (v0.6.0)** | ||
* Support for arbitrary identity providers and pass through to execution resources | ||
* Support for transparent/scoped access to external services **(e.g., Globus transfer) (v0.6.0)** | ||
|
||
Ecosystem | ||
--------- | ||
|
||
* Support for CWL, ability to execute CWL workflows and use CWL app descriptions | ||
* Creation of library of Parsl apps and workflows | ||
* Provenance capture/export in standard formats | ||
* Automatic metrics capture and reporting to understand Parsl usage | ||
* **Anonymous Usage Tracking (v0.4.0)** | ||
|
||
Documentation / Tutorials: | ||
-------------------------- | ||
|
||
* **Documentation about Parsl and its features** | ||
* **Documentation about supported sites (v0.6.0)** | ||
* **Self-guided Jupyter notebook tutorials on Parsl features** | ||
* **Hands-on tutorial suitable for webinars and meetings** | ||
|
||
|
||
|
||
Code Maintenance | ||
---------------- | ||
|
||
* **Type Annotations and Static Type Checking**: Add static type annotations throughout the codebase and add typeguard checks. | ||
* **Release Process**: `Improve the overall release process <https://github.com/Parsl/parsl/issues?q=is%3Aopen+is%3Aissue+label%3Arelease_process>`_ to synchronize docs and code releases, automatically produce changelog documentation. | ||
* **Components Maturity Model**: Defines the `component maturity model <https://github.com/Parsl/parsl/issues/2554>`_ and tags components with their appropriate maturity level. | ||
* **Define and Document Interfaces**: Identify and document interfaces via which `external components <https://parsl.readthedocs.io/en/stable/userguide/plugins.html>`_ can augment the Parsl ecosystem. | ||
* **Distributed Testing Process**: All tests should be run against all possible schedulers, using different executors, on a variety of remote systems. Explore the use of containerized schedulers and remote testing on real systems. | ||
|
||
New Features and Integrations | ||
----------------------------- | ||
|
||
* **Enhanced MPI Support**: Extend Parsl’s MPI model with MPI apps and runtime support capable of running MPI apps in different environments (MPI flavor and launcher). | ||
* **Serialization Configuration**: Enable users to select what serialization methods are used and enable users to supply their own serializer. | ||
* **PSI/J integration**: Integrate PSI/J as a common interface for schedulers. | ||
* **Internal Concurrency Model**: Revisit and rearchitect the concurrency model to reduce areas that are not well understood and reduce the likelihood of errors. | ||
* **Common Model for Errors**: Make Parsl errors self-describing and understandable by users. | ||
* **Plug-in Model for External Components**: Extend Parsl to implement interfaces defined above. | ||
* **User Configuration Validation Tool**: Provide tooling to help users configure Parsl and diagnose and resolve errors. | ||
* **Anonymized Usage Tracking**: Usage tracking is crucial for our data-oriented approach to understand the adoption of Parsl, which components are used, and where errors occur. This allows us to prioritize investment in components, progress components through the maturity levels, and identify bugs. Revisit prior usage tracking and develop a service that enables users to control tracking information. | ||
* **Support for Globus Compute**: Enable execution of Parsl tasks using Globus Compute as an executor. | ||
* **Update Globus Data Management**: Update Globus integration to use the new Globus Connect v5 model (i.e., needing specific scopes for individual endpoints). | ||
* **Performance Measurement**: Improve ability to measure performance metrics and report to users. | ||
* **Enhanced Debugging**: Application-level `logging <https://github.com/Parsl/parsl/issues/1984>`_ to understand app execution. | ||
|
||
Tutorials, Training, and User Support | ||
------------------------------------- | ||
|
||
* **Configuration and Debugging**: Tutorials showing how to configure Parsl for different resources and debug execution. | ||
* **Functional Serialization 101**: Tutorial describing how serialization works and how you can integrate custom serializers. | ||
* **ProxyStore Data Management**: Tutorial showing how you can use ProxyStore to manage data for both inter and intra-site scenarios. | ||
* **Open Dev Calls on Zoom**: The internal core team holds an open dev call/office hours every other Thursday to help users troubleshoot issues, present and share their work, connect with each other, and provide community updates. | ||
* **Project Documentation**: is maintained and updated in `Read the Docs <https://parsl.readthedocs.io/en/stable/index.html>`_. | ||
|
||
Longer-term Objectives | ||
---------------------- | ||
|
||
* **Globus Compute Integration**: Once Globus Compute supports multi-tenancy, Parsl will be able to use it to run remote tasks on initially one and then later multiple resources. | ||
* **Multi-System Optimization**: Once Globus Compute integration is complete, it is best to use multiple systems for multiple tasks as part of a single workflow. | ||
* **HPC Checkpointing and Job Migration**: As new resources become available, HPC tasks will be able to be checkpointed and moved to the system with more resources. |