Add support for the Prefect workflow engine (#875)

Closes #878. Need #876 merged first. --------- Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Quantum-Accelerators · Sep 6, 2023 · 54848d0 · 54848d0
1 parent b07d9fe
commit 54848d0
Show file tree

Hide file tree

Showing 25 changed files with 991 additions and 155 deletions.
diff --git a/.github/workflows/tests.yaml b/.github/workflows/tests.yaml
@@ -340,7 +340,7 @@ jobs:
         uses: codecov/codecov-action@v3
         if: github.repository == 'Quantum-Accelerators/quacc'
 
-  tests-redun-jobflow:
+  tests-redun-jobflow-prefect:
     runs-on: ubuntu-latest
     strategy:
       fail-fast: true
@@ -367,10 +367,11 @@ jobs:
           pip install -r tests/requirements.txt
           pip install -r tests/requirements-jobflow.txt
           pip install -r tests/requirements-redun.txt
+          pip install -r tests/requirements-prefect.txt
           pip install .[dev]
 
       - name: Run tests with pytest
-        run: pytest -k 'jobflow or fireworks or redun' --cov=quacc --cov-report=xml
+        run: pytest -k 'jobflow or fireworks or redun or prefect' --cov=quacc --cov-report=xml
 
       - name: Upload coverage to Codecov
         uses: codecov/codecov-action@v3

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.2.6]
+
+### Added
+
+- Re-added support for the Prefect workflow engine.
+
 ## [0.2.5]
 
 ### Added

diff --git a/docs/dev/contributing.md b/docs/dev/contributing.md
@@ -72,4 +72,4 @@ In general, please try to keep the code style consistent when possible, particul
 
 All changes you make to quacc should be accompanied by unit tests and should not break existing tests. To run the full test suite, run `pytest .` from the `quacc/tests` directory. Each PR will report the coverage once your tests pass, but if you'd like to generate a coverage report locally, you can use [pytest-cov](https://pytest-cov.readthedocs.io/en/latest/), such as by doing `pytest --cov=quacc .` in the `tests` directory.
 
-If you are adding recipes based on a code that can be readily installed via `pip` or `conda` (e.g. tblite, DFTB+, Psi4), then you can run these codes directly in the test suite. Preferably, you should use a small molecule or solid and cheap method so the unit tests run quickly. If the recipes you're adding are proprietary or not available via `pip` or `conda` (e.g. Gaussian, GULP), then you will need to [monkeypatch](https://docs.pytest.org/en/7.1.x/how-to/monkeypatch.html) certain functions to change their behavior during testing. For instance, we do not want to run VASP directly during unit tests and have mocked the `atoms.get_potential_energy()` function to always return a dummy value of -1.0 during unit tests. Any mocked functions can be found in the `conftest.py` files of the testing directory.
+If you are adding recipes based on a code that can be readily installed via `pip` or `conda` (e.g. tblite, DFTB+, Psi4), then you can run these codes directly in the test suite. Preferably, you should use a small molecule or solid and cheap method so the unit tests run quickly. If the recipes you're adding are proprietary or not available via `pip` or `conda` (e.g. Gaussian, GULP), then you will need to [monkeypatch](https://docs.pytest.org/en/7.1.x/how-to/monkeypatch.html) certain functions to change their behavior during testing.
diff --git a/docs/install/codes.md b/docs/install/codes.md
@@ -47,7 +47,8 @@ ASE_GAUSSIAN_COMMAND="/path/to/my/gaussian_executable Gaussian.com > Gaussian.lo
 As noted in the [ASE documentation](https://wiki.fysik.dtu.dk/ase/ase/calculators/gulp.html), you must set the environment variables `GULP_LIB` and `ASE_GULP_COMMAND` as follows:
 
 ```bash
-GULP_LIB="/path/to/my/gulp-#.#.#/Libraries" ASE_GULP_COMMAND="/path/to/my/gulp-#.#.#/Src/gulp < gulp.gin > gulp.got"
+GULP_LIB="/path/to/my/gulp-#.#.#/Libraries"
+ASE_GULP_COMMAND="/path/to/my/gulp-#.#.#/Src/gulp < gulp.gin > gulp.got"
 ```
 
 ## Lennard Jones

diff --git a/docs/install/install.md b/docs/install/install.md
@@ -59,6 +59,7 @@ where `extra` is one of the following:
 - `quacc[covalent]`: Installs dependencies to enable the use of [Covalent](https://www.covalent.xyz).
 - `quacc[jobflow]`: Installs dependencies to enable the use of [Jobflow](https://github.com/materialsproject/jobflow) with [FireWorks](https://github.com/materialsproject/fireworks).
 - `quacc[parsl]`: Installs dependencies to enable the use of [Parsl](https://github.com/Parsl/parsl).
+- `quacc[prefect]`: Installs dependencies to enable the use of [Prefect](https://www.prefect.io/).
 - `quacc[redun]`: Installs dependencies to enable the use of [Redun](https://insitro.github.io/redun/)
 
 ### Miscellaneous

diff --git a/docs/install/wflow_engines.md b/docs/install/wflow_engines.md
@@ -6,7 +6,7 @@ Using a workflow engine is a crucial component for scaling up quacc calculations
 
     If you are just getting started with workflow engines, we recommend first trying Covalent. For a comparison of the different compatible workflow engines, refer to the [Workflow Engines Overview](../user/basics/wflow_overview.md) section.
 
-=== "Covalent"
+=== "Covalent ⭐"
 
     **Installation**
 
@@ -26,7 +26,7 @@ Using a workflow engine is a crucial component for scaling up quacc calculations
 
         Once you start scaling up your calculations, we recommend hosting the Covalent server on a dedicated machine or using [Covalent Cloud](https://www.covalent.xyz/cloud/). Refer to the [Covalent Deployment Guide](https://docs.covalent.xyz/docs/user-documentation/server-deployment) for details.
 
-=== "Parsl"
+=== "Parsl ⭐"
 
     **Installation**
 
@@ -38,6 +38,22 @@ Using a workflow engine is a crucial component for scaling up quacc calculations
 
     Parsl has [many configuration options](https://parsl.readthedocs.io/en/stable/userguide/configuring.html), which we will cover later in the documentation.
 
+=== "Prefect"
+
+    To install Prefect, run the following:
+
+    ```bash
+    pip install quacc[prefect]
+    ```
+
+    To use quacc with Prefect Cloud (recommended):
+
+    1. Make an account on [Prefect Cloud](https://app.prefect.cloud/)
+    2. Make an [API Key](https://docs.prefect.io/cloud/users/api-keys/) and (optionally) store it in a `PREFECT_API_KEY` environment variable (e.g. in your `~/.bashrc`)
+    3. Run `prefect cloud login` from the command-line and enter your API key (or use the browser, if possible)
+
+    Additional configuration parameters can be modified, as described in the [Prefect documentation](https://docs.prefect.io/concepts/settings/).
+
 === "Redun"
 
     **Installation**

diff --git a/docs/user/advanced/atomate2.md b/docs/user/advanced/atomate2.md
@@ -1,6 +1,6 @@
 # Quacc + Atomate2
 
-[Atomate2](https://github.com/materialsproject/atomate2) is a fantastic computational materials science workflow program that shares many similarities with quacc. If you wish to combine workflows from quacc with those from Atomate2, that is possible through the use of Jobflow.
+[Atomate2](https://github.com/materialsproject/atomate2) is a computational materials science workflow program that shares many similarities with quacc. If you wish to combine workflows from quacc with those from Atomate2, that is possible through the use of Jobflow.
 
 !!! Tip
 
@@ -17,13 +17,11 @@ from quacc.recipes.tblite.core import relax_job
 atoms = bulk("Cu")
 
 job1 = relax_job(atoms)
-bandstructure_flow = RelaxBandStructureMaker().make_flow(
-    job1.output["structure"]
-)  # (1)!
+bandstructure_flow = RelaxBandStructureMaker().make_flow(job1.output["structure"])  # (1)!
 
 flow = Flow([job1]) + bandstructure_flow  # (2)!
 ```
 
-1. All Atomate2 workflows take a Pymatgen `Structure` or `Molecule` object as input. This is one of the properties in the returned output of a quacc recipe, which is why we can do `job1.output["structure"]`.
+1. All Atomate2 workflows take a Pymatgen `Structure` or `Molecule` object as input. This is one of the properties in the returned output of a quacc recipe, which is why we can do `#!Python job1.output["structure"]`.
 
 2. The `+` operator can be used to combine two flows into one. We converted the first job into its own `Flow` definition to enable this.
diff --git a/docs/user/advanced/database.md b/docs/user/advanced/database.md
@@ -54,7 +54,7 @@ Oftentimes, it is beneficial to store the results in a database for easy queryin
     results_to_db(store, results)
     ```
 
-=== "Covalent"
+=== "Covalent ⭐"
 
     Covalent automatically stores all the inputs and outputs in an SQLite database, which you can find at the `"db_path"` when you run `covalent config`, and the results can be queried using the `#!Python ct.get_result(<dispatch ID>)` syntax. However, if you want to store the results in a different database of your choosing, you can do so quite easily.
 

diff --git a/docs/user/advanced/file_transfers.md b/docs/user/advanced/file_transfers.md
@@ -6,28 +6,29 @@
 
 Sometimes, you may want to transfer files between jobs. Every recipe within quacc takes an optional keyword argument `copy_files` that is a list of absolute filepaths to files you wish to have copied to the directory where the calculation is ultimately run.
 
-For instance, if you have a file `WAVECAR` stored in `/path/to/my/file/stage`, then you could ensure that is present in the calculation's working directory as follows:
+For instance, if you have a file `WAVECAR` stored in `/path/to/my/file/stage`, then you could ensure that is present in the calculation's working directory:
 
 ```python
+from pathlib import Path
 from ase.build import bulk
 from quacc.recipes.vasp.core import relax_job
 
 atoms = bulk("Cu")
-relax_job(atoms, copy_files=["/path/to/my/file/stage/WAVECAR"])
+relax_job(atoms, copy_files=[Path("path/to/my/file/stage/WAVECAR")])
 ```
 
 ### Transfers Between Jobs
 
-Sometimes, however, you may not necessarily know _a priori_ where the source file is. For instance, perhaps you want to copy the file `WAVECAR` from a previous job in your workflow that is stored in a unique directory only determined at runtime. In this scenario, you can still use the `copy_files` keyword argument, but you will need to fetch the prior job's directory. This can be done as follows:
+Sometimes, however, you may not necessarily know _a priori_ where the source file is. For instance, perhaps you want to copy the file `WAVECAR` from a previous job in your workflow that is stored in a unique directory only determined at runtime. In this scenario, you can still use the `copy_files` keyword argument, but you will need to fetch the prior job's directory.
 
 ```python
-import os
+from pathlib import Path
 from ase.build import bulk
 from quacc.recipes.vasp.core import relax_job, static_job
 
 atoms = bulk("Cu")
 results1 = relax_job(atoms)
-static_job(results1, copy_files=[os.path.join(results1["dir_name"], "WAVECAR")])
+static_job(results1, copy_files=[Path(results1["dir_name"], "WAVECAR")])
 ```
 
 ## Non-Local File Transfers

diff --git a/docs/user/basics/wflow_overview.md b/docs/user/basics/wflow_overview.md
@@ -2,22 +2,16 @@
 
 Everyone's computing needs are different, so we ensured that quacc is interoperable with a variety of modern workflow management tools. There are [300+ workflow management tools](https://workflows.community/systems) out there, so we can't possibly support them all. Instead, we have focused on a select few that adopt a similar decorator-based approach to defining workflows with substantial support for HPC systems.
 
-## Choosing a Workflow Engine
-
-### Summary
+## Summary
 
 !!! Tip
 
     If you are new to workflow engines or would like a helpful UI to monitor workflows, try **Covalent**. If you have a need for speed and are savvy with supercomputers, try **Parsl**.
 
-### Pros and Cons
-
-=== "Covalent"
+=== "Covalent ⭐"
 
     [Covalent](https://github.com/AgnostiqHQ/covalent/) is a workflow management solution from the company [Agnostiq](https://agnostiq.ai/).
 
-    Summary: Use Covalent if you are looking for a nice UI and don't mind relying on a long-running server for production calculations.
-
     Pros:
 
     - Extremely simple to setup and use, even for complex workflows
@@ -32,61 +26,76 @@ Everyone's computing needs are different, so we ensured that quacc is interopera
     - Not as widely used as other workflow management solutions
     - It requires a centralized server to be running continuously in order to manage the workflows
     - High-security HPC environments may be difficult to access via SSH with the centralized server approach
-    - Not ideal for large numbers of short-duration jobs on remote HPC machines
 
-=== "Parsl"
+=== "Parsl ⭐"
 
     [Parsl](https://github.com/Parsl/parsl) is a workflow management solution out of Argonne National Laboratory, the University of Chicago, and the University of Illinois. It is well-adapted for running on virtually any HPC environment with a job scheduler.
 
-    Summary: Use Parsl if you are looking for the most robust solution for HPC machines and don't mind the lack of a UI.
-
     Pros:
 
     - Extremely configurable for virtually any HPC environment
-    - Relatively simple to define the workflows
+    - Quite simple to define the workflows
     - Active community, particularly across academia
     - Well-suited for [pilot jobs](https://en.wikipedia.org/wiki/Pilot_job) and has near-ideal scaling performance
     - Thorough documentation
-    - Does not rely on maintaining a centralized server of any kind
+    - Does not rely on maintaining a centralized server
 
     Cons:
 
-    - Defining the right configuration options for your desired HPC setup can be an initial hurdle
+    - Defining the right configuration options for your desired HPC setups can be an initial hurdle
     - Monitoring job progress is more challenging and less detailed than other solutions
     - The concept of always returning a "future" object can be confusing for new users
 
+=== "Prefect"
+
+    !!! Warning
+
+        Prefect support is currently unstable until [PR 876](https://github.com/Quantum-Accelerators/quacc/pull/876) is merged.
+
+    [Prefect](https://www.prefect.io/) is a workflow management system that is widely adopted in the data science industry.
+
+    Pros:
+
+    - Very popular in the data science industry with an active community
+    - Has a nice dashboard to monitor job progress
+    - Supports a variety of job schedulers via `dask-jobqueue`
+    - Uses a directed acyclic graph-free model for increased flexibility in workflow definitions
+
+    Cons:
+
+    - Lacks documentation for HPC environments, although it supports them
+    - Not practical to use if the compute nodes do not support network connections
+    - The dashboard stores data for only a 7 day history by default and does not display the full output of each task
+    - Sorting out the details of agents, workers, and queues can be challenging
+    - The concept of always returning a "future" object can be confusing for new users
+
 === "Redun"
 
     [Redun](https://insitro.github.io/redun/) is a flexible workflow management program developed by [Insitro](https://insitro.com/).
 
-    Summary: Use Redun if you are specifically interested in running on AWS or K8s and like a terminal-based monitoring approach.
-
     Pros:
 
-    - Extremely simple syntax for defining workflows.
-    - Has strong support for task/result caching.
-    - Supports a variety of compute backends.
-    - Useful console-based monitoring system.
+    - Extremely simple syntax for defining workflows
+    - Has strong support for task/result caching
+    - Useful console-based monitoring system
 
     Cons:
 
-    - Currently lacks support for typical HPC job schedulers.
-    - No user-friendly UI for job monitoring.
-    - Less active user community than some other options.
+    - Currently lacks support for typical HPC job schedulers and platforms other than AWS
+    - No user-friendly GUI for job monitoring
+    - Less active user community than some other options
 
 === "Jobflow"
 
     [Jobflow](https://github.com/materialsproject/jobflow) is developed and maintained by the Materials Project team at Lawrence Berkeley National Laboratory and serves as a seamless interface to [FireWorks](https://github.com/materialsproject/fireworks) for dispatching and monitoring compute jobs.
 
-    Summary: Use Jobflow if you want to use compatible software with that used by the Materials Project stack.
-
     **Jobflow**
 
     Pros:
 
-    - Simple interface for defining individual jobs and workflows
-    - Native support for databases
+    - Native support for a variety of databases
     - Directly compatible with Atomate2
+    - Designed with materials science in mind
     - Actively supported by the Materials Project team
 
     Cons:
@@ -99,12 +108,12 @@ Everyone's computing needs are different, so we ensured that quacc is interopera
 
     Pros:
 
-    - FireWorks is well-suited for a variety of job management approaches
+    - Well-suited for a variety of job management approaches
     - Helpful dashboard for monitoring job progress
 
     Cons:
 
     - FireWorks documentation can be difficult to navigate without prior experience
     - FireWorks can have a steep learning curve due to its many configuration options
     - The reliance on MongoDB can be challenging for new users and certain HPC environments
-    - New features are not currently planned
+    - New features are not planned
Original file line number	Diff line number	Diff line change
Expand Up		@@ -72,4 +72,4 @@ In general, please try to keep the code style consistent when possible, particul

		All changes you make to quacc should be accompanied by unit tests and should not break existing tests. To run the full test suite, run `pytest .` from the `quacc/tests` directory. Each PR will report the coverage once your tests pass, but if you'd like to generate a coverage report locally, you can use [pytest-cov](https://pytest-cov.readthedocs.io/en/latest/), such as by doing `pytest --cov=quacc .` in the `tests` directory.

		If you are adding recipes based on a code that can be readily installed via `pip` or `conda` (e.g. tblite, DFTB+, Psi4), then you can run these codes directly in the test suite. Preferably, you should use a small molecule or solid and cheap method so the unit tests run quickly. If the recipes you're adding are proprietary or not available via `pip` or `conda` (e.g. Gaussian, GULP), then you will need to [monkeypatch](https://docs.pytest.org/en/7.1.x/how-to/monkeypatch.html) certain functions to change their behavior during testing. For instance, we do not want to run VASP directly during unit tests and have mocked the `atoms.get_potential_energy()` function to always return a dummy value of -1.0 during unit tests. Any mocked functions can be found in the `conftest.py` files of the testing directory.
		If you are adding recipes based on a code that can be readily installed via `pip` or `conda` (e.g. tblite, DFTB+, Psi4), then you can run these codes directly in the test suite. Preferably, you should use a small molecule or solid and cheap method so the unit tests run quickly. If the recipes you're adding are proprietary or not available via `pip` or `conda` (e.g. Gaussian, GULP), then you will need to [monkeypatch](https://docs.pytest.org/en/7.1.x/how-to/monkeypatch.html) certain functions to change their behavior during testing.