Skip to content

Commit

Permalink
Add support for the Prefect workflow engine (#875)
Browse files Browse the repository at this point in the history
Closes #878.

Need #876 merged first.

---------

Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
  • Loading branch information
Andrew-S-Rosen and deepsource-autofix[bot] authored Sep 6, 2023
1 parent b07d9fe commit 54848d0
Show file tree
Hide file tree
Showing 25 changed files with 991 additions and 155 deletions.
5 changes: 3 additions & 2 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@ jobs:
uses: codecov/codecov-action@v3
if: github.repository == 'Quantum-Accelerators/quacc'

tests-redun-jobflow:
tests-redun-jobflow-prefect:
runs-on: ubuntu-latest
strategy:
fail-fast: true
Expand All @@ -367,10 +367,11 @@ jobs:
pip install -r tests/requirements.txt
pip install -r tests/requirements-jobflow.txt
pip install -r tests/requirements-redun.txt
pip install -r tests/requirements-prefect.txt
pip install .[dev]
- name: Run tests with pytest
run: pytest -k 'jobflow or fireworks or redun' --cov=quacc --cov-report=xml
run: pytest -k 'jobflow or fireworks or redun or prefect' --cov=quacc --cov-report=xml

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
Expand Down
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.2.6]

### Added

- Re-added support for the Prefect workflow engine.

## [0.2.5]

### Added
Expand Down
2 changes: 1 addition & 1 deletion docs/dev/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,4 @@ In general, please try to keep the code style consistent when possible, particul

All changes you make to quacc should be accompanied by unit tests and should not break existing tests. To run the full test suite, run `pytest .` from the `quacc/tests` directory. Each PR will report the coverage once your tests pass, but if you'd like to generate a coverage report locally, you can use [pytest-cov](https://pytest-cov.readthedocs.io/en/latest/), such as by doing `pytest --cov=quacc .` in the `tests` directory.

If you are adding recipes based on a code that can be readily installed via `pip` or `conda` (e.g. tblite, DFTB+, Psi4), then you can run these codes directly in the test suite. Preferably, you should use a small molecule or solid and cheap method so the unit tests run quickly. If the recipes you're adding are proprietary or not available via `pip` or `conda` (e.g. Gaussian, GULP), then you will need to [monkeypatch](https://docs.pytest.org/en/7.1.x/how-to/monkeypatch.html) certain functions to change their behavior during testing. For instance, we do not want to run VASP directly during unit tests and have mocked the `atoms.get_potential_energy()` function to always return a dummy value of -1.0 during unit tests. Any mocked functions can be found in the `conftest.py` files of the testing directory.
If you are adding recipes based on a code that can be readily installed via `pip` or `conda` (e.g. tblite, DFTB+, Psi4), then you can run these codes directly in the test suite. Preferably, you should use a small molecule or solid and cheap method so the unit tests run quickly. If the recipes you're adding are proprietary or not available via `pip` or `conda` (e.g. Gaussian, GULP), then you will need to [monkeypatch](https://docs.pytest.org/en/7.1.x/how-to/monkeypatch.html) certain functions to change their behavior during testing.
3 changes: 2 additions & 1 deletion docs/install/codes.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ ASE_GAUSSIAN_COMMAND="/path/to/my/gaussian_executable Gaussian.com > Gaussian.lo
As noted in the [ASE documentation](https://wiki.fysik.dtu.dk/ase/ase/calculators/gulp.html), you must set the environment variables `GULP_LIB` and `ASE_GULP_COMMAND` as follows:

```bash
GULP_LIB="/path/to/my/gulp-#.#.#/Libraries" ASE_GULP_COMMAND="/path/to/my/gulp-#.#.#/Src/gulp < gulp.gin > gulp.got"
GULP_LIB="/path/to/my/gulp-#.#.#/Libraries"
ASE_GULP_COMMAND="/path/to/my/gulp-#.#.#/Src/gulp < gulp.gin > gulp.got"
```

## Lennard Jones
Expand Down
1 change: 1 addition & 0 deletions docs/install/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ where `extra` is one of the following:
- `quacc[covalent]`: Installs dependencies to enable the use of [Covalent](https://www.covalent.xyz).
- `quacc[jobflow]`: Installs dependencies to enable the use of [Jobflow](https://github.com/materialsproject/jobflow) with [FireWorks](https://github.com/materialsproject/fireworks).
- `quacc[parsl]`: Installs dependencies to enable the use of [Parsl](https://github.com/Parsl/parsl).
- `quacc[prefect]`: Installs dependencies to enable the use of [Prefect](https://www.prefect.io/).
- `quacc[redun]`: Installs dependencies to enable the use of [Redun](https://insitro.github.io/redun/)

### Miscellaneous
Expand Down
20 changes: 18 additions & 2 deletions docs/install/wflow_engines.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Using a workflow engine is a crucial component for scaling up quacc calculations

If you are just getting started with workflow engines, we recommend first trying Covalent. For a comparison of the different compatible workflow engines, refer to the [Workflow Engines Overview](../user/basics/wflow_overview.md) section.

=== "Covalent"
=== "Covalent"

**Installation**

Expand All @@ -26,7 +26,7 @@ Using a workflow engine is a crucial component for scaling up quacc calculations

Once you start scaling up your calculations, we recommend hosting the Covalent server on a dedicated machine or using [Covalent Cloud](https://www.covalent.xyz/cloud/). Refer to the [Covalent Deployment Guide](https://docs.covalent.xyz/docs/user-documentation/server-deployment) for details.

=== "Parsl"
=== "Parsl"

**Installation**

Expand All @@ -38,6 +38,22 @@ Using a workflow engine is a crucial component for scaling up quacc calculations

Parsl has [many configuration options](https://parsl.readthedocs.io/en/stable/userguide/configuring.html), which we will cover later in the documentation.

=== "Prefect"

To install Prefect, run the following:

```bash
pip install quacc[prefect]
```

To use quacc with Prefect Cloud (recommended):

1. Make an account on [Prefect Cloud](https://app.prefect.cloud/)
2. Make an [API Key](https://docs.prefect.io/cloud/users/api-keys/) and (optionally) store it in a `PREFECT_API_KEY` environment variable (e.g. in your `~/.bashrc`)
3. Run `prefect cloud login` from the command-line and enter your API key (or use the browser, if possible)

Additional configuration parameters can be modified, as described in the [Prefect documentation](https://docs.prefect.io/concepts/settings/).

=== "Redun"

**Installation**
Expand Down
8 changes: 3 additions & 5 deletions docs/user/advanced/atomate2.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Quacc + Atomate2

[Atomate2](https://github.com/materialsproject/atomate2) is a fantastic computational materials science workflow program that shares many similarities with quacc. If you wish to combine workflows from quacc with those from Atomate2, that is possible through the use of Jobflow.
[Atomate2](https://github.com/materialsproject/atomate2) is a computational materials science workflow program that shares many similarities with quacc. If you wish to combine workflows from quacc with those from Atomate2, that is possible through the use of Jobflow.

!!! Tip

Expand All @@ -17,13 +17,11 @@ from quacc.recipes.tblite.core import relax_job
atoms = bulk("Cu")

job1 = relax_job(atoms)
bandstructure_flow = RelaxBandStructureMaker().make_flow(
job1.output["structure"]
) # (1)!
bandstructure_flow = RelaxBandStructureMaker().make_flow(job1.output["structure"]) # (1)!

flow = Flow([job1]) + bandstructure_flow # (2)!
```

1. All Atomate2 workflows take a Pymatgen `Structure` or `Molecule` object as input. This is one of the properties in the returned output of a quacc recipe, which is why we can do `job1.output["structure"]`.
1. All Atomate2 workflows take a Pymatgen `Structure` or `Molecule` object as input. This is one of the properties in the returned output of a quacc recipe, which is why we can do `#!Python job1.output["structure"]`.

2. The `+` operator can be used to combine two flows into one. We converted the first job into its own `Flow` definition to enable this.
2 changes: 1 addition & 1 deletion docs/user/advanced/database.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Oftentimes, it is beneficial to store the results in a database for easy queryin
results_to_db(store, results)
```

=== "Covalent"
=== "Covalent"

Covalent automatically stores all the inputs and outputs in an SQLite database, which you can find at the `"db_path"` when you run `covalent config`, and the results can be queried using the `#!Python ct.get_result(<dispatch ID>)` syntax. However, if you want to store the results in a different database of your choosing, you can do so quite easily.

Expand Down
11 changes: 6 additions & 5 deletions docs/user/advanced/file_transfers.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,28 +6,29 @@

Sometimes, you may want to transfer files between jobs. Every recipe within quacc takes an optional keyword argument `copy_files` that is a list of absolute filepaths to files you wish to have copied to the directory where the calculation is ultimately run.

For instance, if you have a file `WAVECAR` stored in `/path/to/my/file/stage`, then you could ensure that is present in the calculation's working directory as follows:
For instance, if you have a file `WAVECAR` stored in `/path/to/my/file/stage`, then you could ensure that is present in the calculation's working directory:

```python
from pathlib import Path
from ase.build import bulk
from quacc.recipes.vasp.core import relax_job

atoms = bulk("Cu")
relax_job(atoms, copy_files=["/path/to/my/file/stage/WAVECAR"])
relax_job(atoms, copy_files=[Path("path/to/my/file/stage/WAVECAR")])
```

### Transfers Between Jobs

Sometimes, however, you may not necessarily know _a priori_ where the source file is. For instance, perhaps you want to copy the file `WAVECAR` from a previous job in your workflow that is stored in a unique directory only determined at runtime. In this scenario, you can still use the `copy_files` keyword argument, but you will need to fetch the prior job's directory. This can be done as follows:
Sometimes, however, you may not necessarily know _a priori_ where the source file is. For instance, perhaps you want to copy the file `WAVECAR` from a previous job in your workflow that is stored in a unique directory only determined at runtime. In this scenario, you can still use the `copy_files` keyword argument, but you will need to fetch the prior job's directory.

```python
import os
from pathlib import Path
from ase.build import bulk
from quacc.recipes.vasp.core import relax_job, static_job

atoms = bulk("Cu")
results1 = relax_job(atoms)
static_job(results1, copy_files=[os.path.join(results1["dir_name"], "WAVECAR")])
static_job(results1, copy_files=[Path(results1["dir_name"], "WAVECAR")])
```

## Non-Local File Transfers
Expand Down
69 changes: 39 additions & 30 deletions docs/user/basics/wflow_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,16 @@

Everyone's computing needs are different, so we ensured that quacc is interoperable with a variety of modern workflow management tools. There are [300+ workflow management tools](https://workflows.community/systems) out there, so we can't possibly support them all. Instead, we have focused on a select few that adopt a similar decorator-based approach to defining workflows with substantial support for HPC systems.

## Choosing a Workflow Engine

### Summary
## Summary

!!! Tip

If you are new to workflow engines or would like a helpful UI to monitor workflows, try **Covalent**. If you have a need for speed and are savvy with supercomputers, try **Parsl**.

### Pros and Cons

=== "Covalent"
=== "Covalent ⭐"

[Covalent](https://github.com/AgnostiqHQ/covalent/) is a workflow management solution from the company [Agnostiq](https://agnostiq.ai/).

Summary: Use Covalent if you are looking for a nice UI and don't mind relying on a long-running server for production calculations.

Pros:

- Extremely simple to setup and use, even for complex workflows
Expand All @@ -32,61 +26,76 @@ Everyone's computing needs are different, so we ensured that quacc is interopera
- Not as widely used as other workflow management solutions
- It requires a centralized server to be running continuously in order to manage the workflows
- High-security HPC environments may be difficult to access via SSH with the centralized server approach
- Not ideal for large numbers of short-duration jobs on remote HPC machines

=== "Parsl"
=== "Parsl"

[Parsl](https://github.com/Parsl/parsl) is a workflow management solution out of Argonne National Laboratory, the University of Chicago, and the University of Illinois. It is well-adapted for running on virtually any HPC environment with a job scheduler.

Summary: Use Parsl if you are looking for the most robust solution for HPC machines and don't mind the lack of a UI.

Pros:

- Extremely configurable for virtually any HPC environment
- Relatively simple to define the workflows
- Quite simple to define the workflows
- Active community, particularly across academia
- Well-suited for [pilot jobs](https://en.wikipedia.org/wiki/Pilot_job) and has near-ideal scaling performance
- Thorough documentation
- Does not rely on maintaining a centralized server of any kind
- Does not rely on maintaining a centralized server

Cons:

- Defining the right configuration options for your desired HPC setup can be an initial hurdle
- Defining the right configuration options for your desired HPC setups can be an initial hurdle
- Monitoring job progress is more challenging and less detailed than other solutions
- The concept of always returning a "future" object can be confusing for new users

=== "Prefect"

!!! Warning

Prefect support is currently unstable until [PR 876](https://github.com/Quantum-Accelerators/quacc/pull/876) is merged.

[Prefect](https://www.prefect.io/) is a workflow management system that is widely adopted in the data science industry.

Pros:

- Very popular in the data science industry with an active community
- Has a nice dashboard to monitor job progress
- Supports a variety of job schedulers via `dask-jobqueue`
- Uses a directed acyclic graph-free model for increased flexibility in workflow definitions

Cons:

- Lacks documentation for HPC environments, although it supports them
- Not practical to use if the compute nodes do not support network connections
- The dashboard stores data for only a 7 day history by default and does not display the full output of each task
- Sorting out the details of agents, workers, and queues can be challenging
- The concept of always returning a "future" object can be confusing for new users

=== "Redun"

[Redun](https://insitro.github.io/redun/) is a flexible workflow management program developed by [Insitro](https://insitro.com/).

Summary: Use Redun if you are specifically interested in running on AWS or K8s and like a terminal-based monitoring approach.

Pros:

- Extremely simple syntax for defining workflows.
- Has strong support for task/result caching.
- Supports a variety of compute backends.
- Useful console-based monitoring system.
- Extremely simple syntax for defining workflows
- Has strong support for task/result caching
- Useful console-based monitoring system

Cons:

- Currently lacks support for typical HPC job schedulers.
- No user-friendly UI for job monitoring.
- Less active user community than some other options.
- Currently lacks support for typical HPC job schedulers and platforms other than AWS
- No user-friendly GUI for job monitoring
- Less active user community than some other options

=== "Jobflow"

[Jobflow](https://github.com/materialsproject/jobflow) is developed and maintained by the Materials Project team at Lawrence Berkeley National Laboratory and serves as a seamless interface to [FireWorks](https://github.com/materialsproject/fireworks) for dispatching and monitoring compute jobs.

Summary: Use Jobflow if you want to use compatible software with that used by the Materials Project stack.

**Jobflow**

Pros:

- Simple interface for defining individual jobs and workflows
- Native support for databases
- Native support for a variety of databases
- Directly compatible with Atomate2
- Designed with materials science in mind
- Actively supported by the Materials Project team

Cons:
Expand All @@ -99,12 +108,12 @@ Everyone's computing needs are different, so we ensured that quacc is interopera

Pros:

- FireWorks is well-suited for a variety of job management approaches
- Well-suited for a variety of job management approaches
- Helpful dashboard for monitoring job progress

Cons:

- FireWorks documentation can be difficult to navigate without prior experience
- FireWorks can have a steep learning curve due to its many configuration options
- The reliance on MongoDB can be challenging for new users and certain HPC environments
- New features are not currently planned
- New features are not planned
Loading

0 comments on commit 54848d0

Please sign in to comment.