Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the Prefect workflow engine #875

Merged
merged 33 commits into from
Sep 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
da3f1a2
re-add support for prefect
Andrew-S-Rosen Sep 5, 2023
8a4fc6c
style: Format code with black, prettier and isort
deepsource-autofix[bot] Sep 5, 2023
d9a0f62
patch
Andrew-S-Rosen Sep 5, 2023
8fd2a15
Merge remote-tracking branch 'origin/prefect' into prefect
Andrew-S-Rosen Sep 5, 2023
252ffd1
docs update
Andrew-S-Rosen Sep 5, 2023
6a8ceba
fix
Andrew-S-Rosen Sep 5, 2023
3b44e4f
style: Format code with black, prettier and isort
deepsource-autofix[bot] Sep 5, 2023
56abe15
fix
Andrew-S-Rosen Sep 5, 2023
486eca1
Merge remote-tracking branch 'origin/prefect' into prefect
Andrew-S-Rosen Sep 5, 2023
35782fc
Update requirements-prefect.txt
Andrew-S-Rosen Sep 5, 2023
2b82625
Merge branch 'main' into prefect
Andrew-S-Rosen Sep 5, 2023
fcc73a4
Merge branch 'main' into prefect
Andrew-S-Rosen Sep 5, 2023
0867f20
style: Format code with black, prettier and isort
deepsource-autofix[bot] Sep 5, 2023
8bcff58
Merge branch 'main' into prefect
Andrew-S-Rosen Sep 5, 2023
514ace1
Merge branch 'main' into prefect
Andrew-S-Rosen Sep 6, 2023
5d701d1
update docs
Andrew-S-Rosen Sep 6, 2023
e04ced9
update parsl docs
Andrew-S-Rosen Sep 6, 2023
486d463
style: Format code with black, prettier and isort
deepsource-autofix[bot] Sep 6, 2023
9d3ba6f
fix docs
Andrew-S-Rosen Sep 6, 2023
dd23778
Merge remote-tracking branch 'origin/prefect' into prefect
Andrew-S-Rosen Sep 6, 2023
9663ec8
bump
Andrew-S-Rosen Sep 6, 2023
dfcf3f6
update
Andrew-S-Rosen Sep 6, 2023
5bbbf65
upload
Andrew-S-Rosen Sep 6, 2023
29e9837
fix test
Andrew-S-Rosen Sep 6, 2023
4fe323e
fix
Andrew-S-Rosen Sep 6, 2023
00d99ac
Merge branch 'main' into prefect
Andrew-S-Rosen Sep 6, 2023
76b0c2b
fix
Andrew-S-Rosen Sep 6, 2023
5a863f8
Merge remote-tracking branch 'origin/prefect' into prefect
Andrew-S-Rosen Sep 6, 2023
170e073
update
Andrew-S-Rosen Sep 6, 2023
5b480f5
style: Format code with black, prettier and isort
deepsource-autofix[bot] Sep 6, 2023
0b928b1
remove file
Andrew-S-Rosen Sep 6, 2023
6757661
Merge remote-tracking branch 'origin/prefect' into prefect
Andrew-S-Rosen Sep 6, 2023
eb51cac
fix tests
Andrew-S-Rosen Sep 6, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@ jobs:
uses: codecov/codecov-action@v3
if: github.repository == 'Quantum-Accelerators/quacc'

tests-redun-jobflow:
tests-redun-jobflow-prefect:
runs-on: ubuntu-latest
strategy:
fail-fast: true
Expand All @@ -367,10 +367,11 @@ jobs:
pip install -r tests/requirements.txt
pip install -r tests/requirements-jobflow.txt
pip install -r tests/requirements-redun.txt
pip install -r tests/requirements-prefect.txt
pip install .[dev]

- name: Run tests with pytest
run: pytest -k 'jobflow or fireworks or redun' --cov=quacc --cov-report=xml
run: pytest -k 'jobflow or fireworks or redun or prefect' --cov=quacc --cov-report=xml

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
Expand Down
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.2.6]

### Added

- Re-added support for the Prefect workflow engine.

## [0.2.5]

### Added
Expand Down
2 changes: 1 addition & 1 deletion docs/dev/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,4 @@ In general, please try to keep the code style consistent when possible, particul

All changes you make to quacc should be accompanied by unit tests and should not break existing tests. To run the full test suite, run `pytest .` from the `quacc/tests` directory. Each PR will report the coverage once your tests pass, but if you'd like to generate a coverage report locally, you can use [pytest-cov](https://pytest-cov.readthedocs.io/en/latest/), such as by doing `pytest --cov=quacc .` in the `tests` directory.

If you are adding recipes based on a code that can be readily installed via `pip` or `conda` (e.g. tblite, DFTB+, Psi4), then you can run these codes directly in the test suite. Preferably, you should use a small molecule or solid and cheap method so the unit tests run quickly. If the recipes you're adding are proprietary or not available via `pip` or `conda` (e.g. Gaussian, GULP), then you will need to [monkeypatch](https://docs.pytest.org/en/7.1.x/how-to/monkeypatch.html) certain functions to change their behavior during testing. For instance, we do not want to run VASP directly during unit tests and have mocked the `atoms.get_potential_energy()` function to always return a dummy value of -1.0 during unit tests. Any mocked functions can be found in the `conftest.py` files of the testing directory.
If you are adding recipes based on a code that can be readily installed via `pip` or `conda` (e.g. tblite, DFTB+, Psi4), then you can run these codes directly in the test suite. Preferably, you should use a small molecule or solid and cheap method so the unit tests run quickly. If the recipes you're adding are proprietary or not available via `pip` or `conda` (e.g. Gaussian, GULP), then you will need to [monkeypatch](https://docs.pytest.org/en/7.1.x/how-to/monkeypatch.html) certain functions to change their behavior during testing.
3 changes: 2 additions & 1 deletion docs/install/codes.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ ASE_GAUSSIAN_COMMAND="/path/to/my/gaussian_executable Gaussian.com > Gaussian.lo
As noted in the [ASE documentation](https://wiki.fysik.dtu.dk/ase/ase/calculators/gulp.html), you must set the environment variables `GULP_LIB` and `ASE_GULP_COMMAND` as follows:

```bash
GULP_LIB="/path/to/my/gulp-#.#.#/Libraries" ASE_GULP_COMMAND="/path/to/my/gulp-#.#.#/Src/gulp < gulp.gin > gulp.got"
GULP_LIB="/path/to/my/gulp-#.#.#/Libraries"
ASE_GULP_COMMAND="/path/to/my/gulp-#.#.#/Src/gulp < gulp.gin > gulp.got"
```

## Lennard Jones
Expand Down
1 change: 1 addition & 0 deletions docs/install/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ where `extra` is one of the following:
- `quacc[covalent]`: Installs dependencies to enable the use of [Covalent](https://www.covalent.xyz).
- `quacc[jobflow]`: Installs dependencies to enable the use of [Jobflow](https://github.com/materialsproject/jobflow) with [FireWorks](https://github.com/materialsproject/fireworks).
- `quacc[parsl]`: Installs dependencies to enable the use of [Parsl](https://github.com/Parsl/parsl).
- `quacc[prefect]`: Installs dependencies to enable the use of [Prefect](https://www.prefect.io/).
- `quacc[redun]`: Installs dependencies to enable the use of [Redun](https://insitro.github.io/redun/)

### Miscellaneous
Expand Down
20 changes: 18 additions & 2 deletions docs/install/wflow_engines.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Using a workflow engine is a crucial component for scaling up quacc calculations

If you are just getting started with workflow engines, we recommend first trying Covalent. For a comparison of the different compatible workflow engines, refer to the [Workflow Engines Overview](../user/basics/wflow_overview.md) section.

=== "Covalent"
=== "Covalent"

**Installation**

Expand All @@ -26,7 +26,7 @@ Using a workflow engine is a crucial component for scaling up quacc calculations

Once you start scaling up your calculations, we recommend hosting the Covalent server on a dedicated machine or using [Covalent Cloud](https://www.covalent.xyz/cloud/). Refer to the [Covalent Deployment Guide](https://docs.covalent.xyz/docs/user-documentation/server-deployment) for details.

=== "Parsl"
=== "Parsl"

**Installation**

Expand All @@ -38,6 +38,22 @@ Using a workflow engine is a crucial component for scaling up quacc calculations

Parsl has [many configuration options](https://parsl.readthedocs.io/en/stable/userguide/configuring.html), which we will cover later in the documentation.

=== "Prefect"

To install Prefect, run the following:

```bash
pip install quacc[prefect]
```

To use quacc with Prefect Cloud (recommended):

1. Make an account on [Prefect Cloud](https://app.prefect.cloud/)
2. Make an [API Key](https://docs.prefect.io/cloud/users/api-keys/) and (optionally) store it in a `PREFECT_API_KEY` environment variable (e.g. in your `~/.bashrc`)
3. Run `prefect cloud login` from the command-line and enter your API key (or use the browser, if possible)

Additional configuration parameters can be modified, as described in the [Prefect documentation](https://docs.prefect.io/concepts/settings/).

=== "Redun"

**Installation**
Expand Down
8 changes: 3 additions & 5 deletions docs/user/advanced/atomate2.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Quacc + Atomate2

[Atomate2](https://github.com/materialsproject/atomate2) is a fantastic computational materials science workflow program that shares many similarities with quacc. If you wish to combine workflows from quacc with those from Atomate2, that is possible through the use of Jobflow.
[Atomate2](https://github.com/materialsproject/atomate2) is a computational materials science workflow program that shares many similarities with quacc. If you wish to combine workflows from quacc with those from Atomate2, that is possible through the use of Jobflow.

!!! Tip

Expand All @@ -17,13 +17,11 @@ from quacc.recipes.tblite.core import relax_job
atoms = bulk("Cu")

job1 = relax_job(atoms)
bandstructure_flow = RelaxBandStructureMaker().make_flow(
job1.output["structure"]
) # (1)!
bandstructure_flow = RelaxBandStructureMaker().make_flow(job1.output["structure"]) # (1)!

flow = Flow([job1]) + bandstructure_flow # (2)!
```

1. All Atomate2 workflows take a Pymatgen `Structure` or `Molecule` object as input. This is one of the properties in the returned output of a quacc recipe, which is why we can do `job1.output["structure"]`.
1. All Atomate2 workflows take a Pymatgen `Structure` or `Molecule` object as input. This is one of the properties in the returned output of a quacc recipe, which is why we can do `#!Python job1.output["structure"]`.

2. The `+` operator can be used to combine two flows into one. We converted the first job into its own `Flow` definition to enable this.
2 changes: 1 addition & 1 deletion docs/user/advanced/database.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Oftentimes, it is beneficial to store the results in a database for easy queryin
results_to_db(store, results)
```

=== "Covalent"
=== "Covalent"

Covalent automatically stores all the inputs and outputs in an SQLite database, which you can find at the `"db_path"` when you run `covalent config`, and the results can be queried using the `#!Python ct.get_result(<dispatch ID>)` syntax. However, if you want to store the results in a different database of your choosing, you can do so quite easily.

Expand Down
11 changes: 6 additions & 5 deletions docs/user/advanced/file_transfers.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,28 +6,29 @@

Sometimes, you may want to transfer files between jobs. Every recipe within quacc takes an optional keyword argument `copy_files` that is a list of absolute filepaths to files you wish to have copied to the directory where the calculation is ultimately run.

For instance, if you have a file `WAVECAR` stored in `/path/to/my/file/stage`, then you could ensure that is present in the calculation's working directory as follows:
For instance, if you have a file `WAVECAR` stored in `/path/to/my/file/stage`, then you could ensure that is present in the calculation's working directory:

```python
from pathlib import Path
from ase.build import bulk
from quacc.recipes.vasp.core import relax_job

atoms = bulk("Cu")
relax_job(atoms, copy_files=["/path/to/my/file/stage/WAVECAR"])
relax_job(atoms, copy_files=[Path("path/to/my/file/stage/WAVECAR")])
```

### Transfers Between Jobs

Sometimes, however, you may not necessarily know _a priori_ where the source file is. For instance, perhaps you want to copy the file `WAVECAR` from a previous job in your workflow that is stored in a unique directory only determined at runtime. In this scenario, you can still use the `copy_files` keyword argument, but you will need to fetch the prior job's directory. This can be done as follows:
Sometimes, however, you may not necessarily know _a priori_ where the source file is. For instance, perhaps you want to copy the file `WAVECAR` from a previous job in your workflow that is stored in a unique directory only determined at runtime. In this scenario, you can still use the `copy_files` keyword argument, but you will need to fetch the prior job's directory.

```python
import os
from pathlib import Path
from ase.build import bulk
from quacc.recipes.vasp.core import relax_job, static_job

atoms = bulk("Cu")
results1 = relax_job(atoms)
static_job(results1, copy_files=[os.path.join(results1["dir_name"], "WAVECAR")])
static_job(results1, copy_files=[Path(results1["dir_name"], "WAVECAR")])
```

## Non-Local File Transfers
Expand Down
69 changes: 39 additions & 30 deletions docs/user/basics/wflow_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,16 @@

Everyone's computing needs are different, so we ensured that quacc is interoperable with a variety of modern workflow management tools. There are [300+ workflow management tools](https://workflows.community/systems) out there, so we can't possibly support them all. Instead, we have focused on a select few that adopt a similar decorator-based approach to defining workflows with substantial support for HPC systems.

## Choosing a Workflow Engine

### Summary
## Summary

!!! Tip

If you are new to workflow engines or would like a helpful UI to monitor workflows, try **Covalent**. If you have a need for speed and are savvy with supercomputers, try **Parsl**.

### Pros and Cons

=== "Covalent"
=== "Covalent ⭐"

[Covalent](https://github.com/AgnostiqHQ/covalent/) is a workflow management solution from the company [Agnostiq](https://agnostiq.ai/).

Summary: Use Covalent if you are looking for a nice UI and don't mind relying on a long-running server for production calculations.

Pros:

- Extremely simple to setup and use, even for complex workflows
Expand All @@ -32,61 +26,76 @@ Everyone's computing needs are different, so we ensured that quacc is interopera
- Not as widely used as other workflow management solutions
- It requires a centralized server to be running continuously in order to manage the workflows
- High-security HPC environments may be difficult to access via SSH with the centralized server approach
- Not ideal for large numbers of short-duration jobs on remote HPC machines

=== "Parsl"
=== "Parsl"

[Parsl](https://github.com/Parsl/parsl) is a workflow management solution out of Argonne National Laboratory, the University of Chicago, and the University of Illinois. It is well-adapted for running on virtually any HPC environment with a job scheduler.

Summary: Use Parsl if you are looking for the most robust solution for HPC machines and don't mind the lack of a UI.

Pros:

- Extremely configurable for virtually any HPC environment
- Relatively simple to define the workflows
- Quite simple to define the workflows
- Active community, particularly across academia
- Well-suited for [pilot jobs](https://en.wikipedia.org/wiki/Pilot_job) and has near-ideal scaling performance
- Thorough documentation
- Does not rely on maintaining a centralized server of any kind
- Does not rely on maintaining a centralized server

Cons:

- Defining the right configuration options for your desired HPC setup can be an initial hurdle
- Defining the right configuration options for your desired HPC setups can be an initial hurdle
- Monitoring job progress is more challenging and less detailed than other solutions
- The concept of always returning a "future" object can be confusing for new users

=== "Prefect"

!!! Warning

Prefect support is currently unstable until [PR 876](https://github.com/Quantum-Accelerators/quacc/pull/876) is merged.

[Prefect](https://www.prefect.io/) is a workflow management system that is widely adopted in the data science industry.

Pros:

- Very popular in the data science industry with an active community
- Has a nice dashboard to monitor job progress
- Supports a variety of job schedulers via `dask-jobqueue`
- Uses a directed acyclic graph-free model for increased flexibility in workflow definitions

Cons:

- Lacks documentation for HPC environments, although it supports them
- Not practical to use if the compute nodes do not support network connections
- The dashboard stores data for only a 7 day history by default and does not display the full output of each task
- Sorting out the details of agents, workers, and queues can be challenging
- The concept of always returning a "future" object can be confusing for new users

=== "Redun"

[Redun](https://insitro.github.io/redun/) is a flexible workflow management program developed by [Insitro](https://insitro.com/).

Summary: Use Redun if you are specifically interested in running on AWS or K8s and like a terminal-based monitoring approach.

Pros:

- Extremely simple syntax for defining workflows.
- Has strong support for task/result caching.
- Supports a variety of compute backends.
- Useful console-based monitoring system.
- Extremely simple syntax for defining workflows
- Has strong support for task/result caching
- Useful console-based monitoring system

Cons:

- Currently lacks support for typical HPC job schedulers.
- No user-friendly UI for job monitoring.
- Less active user community than some other options.
- Currently lacks support for typical HPC job schedulers and platforms other than AWS
- No user-friendly GUI for job monitoring
- Less active user community than some other options

=== "Jobflow"

[Jobflow](https://github.com/materialsproject/jobflow) is developed and maintained by the Materials Project team at Lawrence Berkeley National Laboratory and serves as a seamless interface to [FireWorks](https://github.com/materialsproject/fireworks) for dispatching and monitoring compute jobs.

Summary: Use Jobflow if you want to use compatible software with that used by the Materials Project stack.

**Jobflow**

Pros:

- Simple interface for defining individual jobs and workflows
- Native support for databases
- Native support for a variety of databases
- Directly compatible with Atomate2
- Designed with materials science in mind
- Actively supported by the Materials Project team

Cons:
Expand All @@ -99,12 +108,12 @@ Everyone's computing needs are different, so we ensured that quacc is interopera

Pros:

- FireWorks is well-suited for a variety of job management approaches
- Well-suited for a variety of job management approaches
- Helpful dashboard for monitoring job progress

Cons:

- FireWorks documentation can be difficult to navigate without prior experience
- FireWorks can have a steep learning curve due to its many configuration options
- The reliance on MongoDB can be challenging for new users and certain HPC environments
- New features are not currently planned
- New features are not planned
Loading