diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index fdfe0c23..cd64de90 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -27,8 +27,30 @@ repos:
- id: detect-private-key
- id: detect-aws-credentials
args: [ "--allow-missing-credentials" ]
+
- repo: https://github.com/psf/black
rev: 24.10.0
hooks:
- id: black
args: [ "--config", "./pyproject.toml" ]
+
+ - repo: https://github.com/codespell-project/codespell
+ rev: v2.2.4
+ hooks:
+ - id: codespell
+ exclude: tests/fixtures/mydocfile.md
+
+ - repo: https://github.com/igorshubovych/markdownlint-cli
+ rev: v0.41.0
+ hooks:
+ - id: markdownlint
+ args:
+ - "--disable=MD013" # disable line length
+ - "--disable=MD024" # disable multiple headings with the same content (CHANGELOG)
+ - "--disable=MD033" # disable no inline html (needed for analytics dead pixel)
+
+ - repo: https://github.com/tcort/markdown-link-check
+ rev: v3.13.6
+ hooks:
+ - id: markdown-link-check
+ args: [-q]
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 9fbf700c..f054f1ac 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,4 +1,5 @@
# Changelog
+
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
@@ -41,6 +42,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [0.20.0] - 2024-10-22
### Added
+
- Support using envvar in config YAML by @tatiana in #236
- **Callback improvements**
- Support installed code via python callable string by @john-drews in #221
@@ -53,9 +55,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Support telemetry during DAG parsing emitting data to Scarf by @tatiana in #250.
### Fixed
+
- Build DAGs when there is an invalid YAML in the DAGs folder by @quydx and @tatiana in #184
### Others
+
- Development tools
- Fix make docker-run by @pankajkoti in #249
- Add vim dot files to .gitignore by @tatiana in #228
@@ -96,193 +100,288 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Remove duplicated operator in `README.md` by @brair in #263
### Breaking changes
+
- Removed support for Python 3.7
- The license was changed from MIT to Apache 2.0
-
## [0.19.0] - 2023-07-19
+
### Added
+
- Support for Airflow Datasets (data-aware scheduling)
- Support for inherited Operators
## [0.18.1] - 2023-04-28
+
### Fixed
+
- Set default value for `render_template_as_native_obj` to False
## [0.18.0] - 2023-04-11
+
### Added
+
- Support for dynamic task mapping
- Support for `render_template_as_native_obj`
- Support for `template_searchpath`
## [0.17.3] - 2023-03-27
+
### Added
+
- dag-factory specific `Exceptions` with more meaningful names
+
### Fixed
+
- Reverts allowing inheritance of `KubernetesPodOperator`
- Now passing CI for lint check
## [0.17.2] - 2023-03-26
+
### Changed
+
- Allow inheritance of `KubernetesPodOperator`
## [0.17.1] - 2023-01-21
+
### Changed
+
- Changed imports to support Kubernetes Provider > 5.0.0
## [0.17.0] - 2022-12-06
+
### Added
+
- Adds `sla_secs` in `default_args` to convert seconds to `timedelta` obj
+
### Changed
+
- Changed deprecated imports to support Airflow 2.5
- Removed support for Python 3.6
## [0.16.0] - 2022-11-13
+
### Added
+
- Function to scan recursively for YAML DAGs
+
### Changed
+
- Changed deprecated imports to support Airflow 2.4+
## [0.15.0] - 2022-09-09
+
### Added
+
- Support for string concatenation of variables in YAML
## [0.14.0] - 2022-08-22
+
### Added
+
- Cast `on_retry_callback` from `default_args` to `Callable`
## [0.13.0] - 2022-05-27
+
### Added
+
- Add support for custom `timetable`
- Add support for `python_callable_file` for `PythonSensor`
## [0.12.0] - 2022-02-07
+
### Added
+
- Allow `params` to be specified in YAML
- Add environment variables support for `python_callable_file`
- Adds support for `is_paused_upon_creation` flag
- Allow `python_callable` to be specified in YAML
## [0.11.1] - 2021-12-07
+
### Added
+
- Add support for `access_control` in DAG params
+
### Fixed
+
- Fixed tests for Airflow 1.10 by pinning `wtforms`
## [0.11.0] - 2021-10-16
+
### Added
+
- Add support success/failure callables in `SqlSensor`
- Add `sla_secs` option in task param to convert seconds to timedelta object
+
### Fixed
+
- Support Airflow 2.2
## [0.10.1] - 2021-08-24
+
### Added
+
- Add support for `response_check_lambda` option in `HttpSensor`
## [0.10.0] - 2021-08-20
+
### Added
+
- Add support for `HttpSensor`
## [0.9.1] - 2021-07-27
+
### Added
+
- Add support for `python_callable_file` for `BranchPythonOperator`
+
### Fixed
+
- Only try to use `import_string` for callbacks if they are strings
## [0.9.0] - 2021-07-25
+
### Added
+
- Allow callbacks from Python modules
## [0.8.0] - 2021-06-09
+
### Added
+
- Support for `TaskGroups` if using Airflow 2.0
- Separate DAG building and registering logic
## [0.7.2] - 2021-01-21
+
### Fixed
+
- Correctly set `dag.description` depending on Airflow version
## [0.7.1] - 2020-12-19
+
### Added
+
- Examples for using Custom Operator
+
### Fixed
+
- Handle `"None"` as `schedule_interval`
## [0.7.0] - 2020-12-19
+
### Added
+
- Support Airflow 2.0!
## [0.6.0] - 2020-11-16
+
### Added
+
- `catchup` added to DAG parameters
- Support for `ExternalTaskSensor`
- Run test suite against Python 3.8
## [0.5.0] - 2020-08-20
+
### Added
+
- Support for `KubernetesPodOperator`
- `doc_md` parameter at DAG level
- Import `doc_md` from a file or python callable
+
### Fixed
+
- `get_datetime` no longer removes time component
## [0.4.5] - 2020-06-17
+
### Fixed
+
- Do not include DAG `tags` parameter in Airflow versions that do not support it.
## [0.4.4] - 2020-06-12
+
### Fixed
+
- Use correct default for `tags` parameter
## [0.4.3] - 2020-05-24
+
### Added
+
- `execution_timeout` parse at task level
- `tags` parameter at DAG level
## [0.4.2] - 2020-03-28
+
### Added
+
- Method `clean_dags` to clean old dags that might not exist anymore
+
### Changed
+
- `airflow` version
## [0.4.1] - 2020-02-18
+
### Fixed
+
- Default `default_view` parameter to value from `airflow.cfg`
## [0.4.0] - 2020-02-12
+
### Added
+
- Support for additional DAG parameters
+
### Fixed
+
- Define Loader when loading YAML file
## [0.3.0] - 2019-10-11
+
### Added
+
- Support for PythonOperator tasks
+
### Changed
+
- Cleaned up testing suite and added pylint to builds
## [0.2.2] - 2019-09-08
+
### Changed
+
- `airflow` version
+
### Removed
+
- `piplock` and `pipfile` files
## [0.2.1] - 2019-02-26
+
### Added
+
- Python 3+ type-annotations
## [0.2.0] - 2018-11-28
+
### Added
+
- Added badges to README
- Support for timezone aware DAGs
- This CHANGELOG!
## [0.1.1] - 2018-11-20
+
### Removed
+
- Removed `logme` dependency
## [0.1.0] - 2018-11-20
+
- Initial release
[Unreleased]: https://github.com/ajbosco/dag-factory/compare/v0.19.0...HEAD
diff --git a/PRIVACY_NOTICE.md b/PRIVACY_NOTICE.md
index 7aa5f8a1..475a008f 100644
--- a/PRIVACY_NOTICE.md
+++ b/PRIVACY_NOTICE.md
@@ -11,15 +11,16 @@ security fixes. Additionally, this information supports key decisions related to
Deployments and individual users can opt-out of analytics by setting the configuration:
-```
-[dag_factory] enable_telemetry False
+```ini
+[dag_factory]
+enable_telemetry False
```
As described in the [official documentation](https://docs.scarf.sh/gateway/#do-not-track), it is also possible to opt out by setting one of the following environment variables:
```commandline
-DO_NOT_TRACK=True
-SCARF_NO_ANALYTICS=True
+ DO_NOT_TRACK=True
+ SCARF_NO_ANALYTICS=True
```
In addition to Scarf's default data collection, DAG Factory collects the following information:
diff --git a/README.md b/README.md
index 86d7204a..9ee9e5d7 100644
--- a/README.md
+++ b/README.md
@@ -6,15 +6,18 @@
[![Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)
[![Downloads](https://img.shields.io/pypi/dm/dag-factory.svg)](https://img.shields.io/pypi/dm/dag-factory)
-
+
-Welcome to *dag-factory*! *dag-factory* is a library for [Apache Airflow®](https://airflow.apache.org) to construct DAGs declaratively via configuration files.
+Welcome to *dag-factory*! *dag-factory* is a library for [Apache Airflow®](https://airflow.apache.org) to construct DAGs
+declaratively via configuration files.
The minimum requirements for **dag-factory** are:
+
- Python 3.8.0+
- [Apache Airflow®](https://airflow.apache.org) 2.0+
-For a gentle introduction, please take a look at our [Quickstart Guide](#quickstart). For more examples, please see the [examples](/examples) folder.
+For a gentle introduction, please take a look at our [Quickstart Guide](#quickstart). For more examples, please see the
+[examples](/examples) folder.
- [Quickstart](#quickstart)
- [Features](#features)
@@ -34,12 +37,14 @@ These tasks will be leveraging the `BashOperator` to execute simple bash command
![screenshot](/img/quickstart_dag.png)
-1. To install *dag-factory*, run the following pip command in your [Apache Airflow®](https://airflow.apache.org) environment:
+(1) To install *dag-factory*, run the following pip command in your [Apache Airflow®](https://airflow.apache.org) environment:
+
```bash
pip install dag-factory
```
-2. Create a YAML configuration file called `config_file.yml` and save it within your dags folder:
+(2) Create a YAML configuration file called `config_file.yml` and save it within your dags folder:
+
```yaml
example_dag1:
default_args:
@@ -62,9 +67,10 @@ example_dag1:
bash_command: 'echo 3'
dependencies: [task_1]
```
+
We are setting the execution order of the tasks by specifying the `dependencies` key.
-3. In the same folder, create a python file called `generate_dags.py`. This file is responsible for generating the DAGs from the configuration file and is a one-time setup.
+(3) In the same folder, create a python file called `generate_dags.py`. This file is responsible for generating the DAGs from the configuration file and is a one-time setup.
You won't need to modify this file unless you want to add more configuration files or change the configuration file name.
```python
@@ -88,15 +94,17 @@ Please look at the [examples](/examples) folder for more examples.
## Features
### Multiple Configuration Files
+
If you want to split your DAG configuration into multiple files, you can do so by leveraging a suffix in the configuration file name.
+
```python
-# 'airflow' word is required for the dagbag to parse this file
-from dagfactory import load_yaml_dags
+ from dagfactory import load_yaml_dags # load relevant YAML files as airflow DAGs
-load_yaml_dags(globals_dict=globals(), suffix=['dag.yaml'])
+ load_yaml_dags(globals_dict=globals(), suffix=['dag.yaml'])
```
### Dynamically Mapped Tasks
+
If you want to create a dynamic number of tasks, you can use the `mapped_tasks` key in the configuration file. The `mapped_tasks` key is a list of dictionaries, where each dictionary represents a task.
```yaml
@@ -118,9 +126,11 @@ If you want to create a dynamic number of tasks, you can use the `mapped_tasks`
request.output
dependencies: [request]
```
+
![mapped_tasks_example.png](img/mapped_tasks_example.png)
### Datasets
+
**dag-factory** supports scheduling DAGs via [Apache Airflow Datasets](https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html).
To leverage, you need to specify the `Dataset` in the `outlets` key in the configuration file. The `outlets` key is a list of strings that represent the dataset locations.
@@ -156,9 +166,11 @@ consumer_dag:
operator: airflow.operators.bash_operator.BashOperator
bash_command: "echo 'consumer datasets'"
```
+
![datasets_example.png](img/datasets_example.png)
### Custom Operators
+
**dag-factory** supports using custom operators. To leverage, set the path to the custom operator within the `operator` key in the configuration file. You can add any additional parameters that the custom operator requires.
```yaml
@@ -170,9 +182,11 @@ consumer_dag:
operator: customized.operators.breakfast_operators.MakeBreadOperator
bread_type: 'Sourdough'
```
+
![custom_operators.png](img/custom_operators.png)
### Callbacks
+
**dag-factory** also supports using "callbacks" at the DAG, Task, and TaskGroup level. These callbacks can be defined in
a few different ways. The first points directly to a Python function that has been defined in the `include/callbacks.py`
file.
@@ -241,16 +255,16 @@ task_2:
operator: airflow.providers.http.sensors.http.HttpSensor
http_conn_id: 'test-http'
method: 'GET'
- response_check_lambda: 'lambda response: "ok" in reponse.text'
+ response_check_lambda: 'lambda response: "ok" in response.text'
dependencies: [task_1]
```
## Benefits
-* Construct DAGs without knowing Python
-* Construct DAGs without learning Airflow primitives
-* Avoid duplicative code
-* Everyone loves YAML! ;)
+- Construct DAGs without knowing Python
+- Construct DAGs without learning Airflow primitives
+- Avoid duplicative code
+- Everyone loves YAML! ;)
## Contributing
diff --git a/dev/README.md b/dev/README.md
index 599d7034..f5b503ef 100644
--- a/dev/README.md
+++ b/dev/README.md
@@ -1,15 +1,15 @@
-Overview
-========
+# Sample Airflow setup and DAG Factory examples
+
+## Overview
Welcome to Astronomer! This project was generated after you ran 'astro dev init' using the Astronomer CLI. This readme describes the contents of the project, as well as how to run Apache Airflow on your local machine.
-Project Contents
-================
+## Project Contents
Your Astro project contains the following files and folders:
- dags: This folder contains the Python files for your Airflow DAGs. By default, this directory includes one example DAG:
- - `example_astronauts`: This DAG shows a simple ETL pipeline example that queries the list of astronauts currently in space from the Open Notify API and prints a statement for each astronaut. The DAG uses the TaskFlow API to define tasks in Python, and dynamic task mapping to dynamically print a statement for each astronaut. For more on how this DAG works, see our [Getting started tutorial](https://www.astronomer.io/docs/learn/get-started-with-airflow).
+ - `example_astronauts`: This DAG shows a simple ETL pipeline example that queries the list of astronauts currently in space from the Open Notify API and prints a statement for each astronaut. The DAG uses the TaskFlow API to define tasks in Python, and dynamic task mapping to dynamically print a statement for each astronaut. For more on how this DAG works, see our [Getting started tutorial](https://www.astronomer.io/docs/learn/get-started-with-airflow).
- Dockerfile: This file contains a versioned Astro Runtime Docker image that provides a differentiated Airflow experience. If you want to execute other commands or overrides at runtime, specify them here.
- include: This folder contains any additional files that you want to include as part of your project. It is empty by default.
- packages.txt: Install OS-level packages needed for your project by adding them to this file. It is empty by default.
@@ -17,10 +17,9 @@ Your Astro project contains the following files and folders:
- plugins: Add custom or community plugins for your project to this file. It is empty by default.
- airflow_settings.yaml: Use this local-only file to specify Airflow Connections, Variables, and Pools instead of entering them in the Airflow UI as you develop DAGs in this project.
-Deploy Your Project Locally
-===========================
+## Deploy Your Project Locally
-1. Start Airflow on your local machine by running 'astro dev start'.
+(1) Start Airflow on your local machine by running 'astro dev start'.
This command will spin up 4 Docker containers on your machine, each for a different Airflow component:
@@ -29,20 +28,18 @@ This command will spin up 4 Docker containers on your machine, each for a differ
- Scheduler: The Airflow component responsible for monitoring and triggering tasks
- Triggerer: The Airflow component responsible for triggering deferred tasks
-2. Verify that all 4 Docker containers were created by running 'docker ps'.
+(2) Verify that all 4 Docker containers were created by running 'docker ps'.
Note: Running 'astro dev start' will start your project with the Airflow Webserver exposed at port 8080 and Postgres exposed at port 5432. If you already have either of those ports allocated, you can either [stop your existing Docker containers or change the port](https://www.astronomer.io/docs/astro/cli/troubleshoot-locally#ports-are-not-available-for-my-local-airflow-webserver).
-3. Access the Airflow UI for your local Airflow project. To do so, go to http://localhost:8080/ and log in with 'admin' for both your Username and Password.
+(3) Access the Airflow UI for your local Airflow project. To do so, go to and log in with 'admin' for both your Username and Password.
You should also be able to access your Postgres Database at 'localhost:5432/postgres'.
-Deploy Your Project to Astronomer
-=================================
+## Deploy Your Project to Astronomer
-If you have an Astronomer account, pushing code to a Deployment on Astronomer is simple. For deploying instructions, refer to Astronomer documentation: https://www.astronomer.io/docs/astro/deploy-code/
+If you have an Astronomer account, pushing code to a Deployment on Astronomer is simple. For deploying instructions, refer to Astronomer documentation:
-Contact
-=======
+## Contact
The Astronomer CLI is maintained with love by the Astronomer team. To report a bug or suggest a change, reach out to our support.
diff --git a/docs/index.md b/docs/index.md
index 36656ff4..6905616e 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -6,23 +6,21 @@ Everything you need to know about how to build Apache Airflow® workflows using
Are you new to DAG Factory? This is the place to start!
-* [DAG Factory at a glance]()
-* [Install guide]()
-* [Using YAML instead of Python]()
- * [Traditional Airflow Operators]()
- * [TaskFlow API]()
-
+* DAG Factory at a glance
+* Install guide
+* Using YAML instead of Python
+ * Traditional Airflow Operators]
+ * TaskFlow API
## Features
-* [Configuring your workflows]()
- * Environment variables
- * Defaults
-* [Defining actions upon task states]()
- * Callbacks
-* [Dynamically creating tasks during runtime]()
- * Dynamic task mapping
-
+* Configuring your workflows
+ * Environment variables
+ * Defaults
+* Defining actions upon task states
+ * Callbacks
+* Dynamically creating tasks during runtime
+ * Dynamic task mapping
## Getting help
@@ -30,19 +28,17 @@ Having trouble? We'd like to help!
* Report bugs, questions and feature requests in our [ticket tracker](https://github.com/astronomer/dag-factory/issues).
-
## Contributing
DAG Factory is an Open-Source project. Learn about its development process and about how you can contribute:
-* [Contributing to DAG Factory]()
+* Contributing to DAG Factory
* [Github repository](https://github.com/astronomer/dag-factory/)
## License
To learn more about the terms and conditions for use, reproduction and distribution, read the [Apache License 2.0](https://github.com/astronomer/dag-factory/blob/main/LICENSE).
-
## Privacy Notice
This project follows [Astronomer's Privacy Policy](https://www.astronomer.io/privacy/).