Skip to content

Commit

Permalink
Add v2.9.2 support (#381)
Browse files Browse the repository at this point in the history
Co-authored-by: Rafid Al-Humaimidi <[email protected]>
  • Loading branch information
rafidka and Rafid Al-Humaimidi authored Jul 8, 2024
1 parent 197b7f4 commit ea75575
Show file tree
Hide file tree
Showing 13 changed files with 575 additions and 548 deletions.
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,10 @@

This repository provides a command line interface (CLI) utility that replicates an Amazon Managed Workflows for Apache Airflow (MWAA) environment locally.

*Please note: MWAA/AWS/DAG/Plugin issues should be raised through AWS Support or the Airflow Slack #airflow-aws channel. Issues here should be focused on this local-runner repository.*
_Please note: MWAA/AWS/DAG/Plugin issues should be raised through AWS Support or the Airflow Slack #airflow-aws channel. Issues here should be focused on this local-runner repository._

_Please note: The dynamic configurations which are dependent on the class of an environment are
aligned with the Large environment class in this repository._

## About the CLI

Expand All @@ -14,7 +16,7 @@ The CLI builds a Docker container image locally that’s similar to a MWAA produ
```text
dags/
example_lambda.py
example_dag_with_taskflow_api.py
example_dag_with_taskflow_api.py
example_redshift_data_execute_sql.py
docker/
config/
Expand All @@ -34,7 +36,7 @@ docker/
Dockerfile
plugins/
README.md
requirements/
requirements/
requirements.txt
.gitignore
CODE_OF_CONDUCT.md
Expand Down Expand Up @@ -102,7 +104,7 @@ The following section describes where to add your DAG code and supporting files.

#### Requirements.txt

1. Add Python dependencies to `requirements/requirements.txt`.
1. Add Python dependencies to `requirements/requirements.txt`.
2. To test a requirements.txt without running Apache Airflow, use the following script:

```bash
Expand All @@ -117,7 +119,7 @@ Collecting aws-batch (from -r /usr/local/airflow/dags/requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/5d/11/3aedc6e150d2df6f3d422d7107ac9eba5b50261cf57ab813bb00d8299a34/aws_batch-0.6.tar.gz
Collecting awscli (from aws-batch->-r /usr/local/airflow/dags/requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/07/4a/d054884c2ef4eb3c237e1f4007d3ece5c46e286e4258288f0116724af009/awscli-1.19.21-py2.py3-none-any.whl (3.6MB)
100% |████████████████████████████████| 3.6MB 365kB/s
100% |████████████████████████████████| 3.6MB 365kB/s
...
...
...
Expand All @@ -136,7 +138,7 @@ For example usage see [Installing Python dependencies using PyPi.org Requirement

#### Custom plugins

- There is a directory at the root of this repository called plugins.
- There is a directory at the root of this repository called plugins.
- In this directory, create a file for your new custom plugin.
- Add any Python dependencies to `requirements/requirements.txt`.

Expand Down Expand Up @@ -165,7 +167,7 @@ The following section contains common questions and answers you may encounter wh
### Can I test execution role permissions using this repository?

- You can setup the local Airflow's boto with the intended execution role to test your DAGs with AWS operators before uploading to your Amazon S3 bucket. To setup aws connection for Airflow locally see [Airflow | AWS Connection](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html)
To learn more, see [Amazon MWAA Execution Role](https://docs.aws.amazon.com/mwaa/latest/userguide/mwaa-create-role.html).
To learn more, see [Amazon MWAA Execution Role](https://docs.aws.amazon.com/mwaa/latest/userguide/mwaa-create-role.html).
- You can set AWS credentials via environment variables set in the `docker/config/.env.localrunner` env file. To learn more about AWS environment variables, see [Environment variables to configure the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html) and [Using temporary security credentials with the AWS CLI](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html#using-temp-creds-sdk-cli). Simply set the relevant environment variables in `.env.localrunner` and `./mwaa-local-env start`.

### How do I add libraries to requirements.txt and test install?
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.8.1
2.9.2
6 changes: 3 additions & 3 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ LABEL maintainer="amazon"

# Airflow
## Version specific ARGs
ARG AIRFLOW_VERSION=2.8.1
ARG WATCHTOWER_VERSION=3.0.1
ARG PROVIDER_AMAZON_VERSION=8.16.0
ARG AIRFLOW_VERSION=2.9.2
ARG WATCHTOWER_VERSION=3.2.0
ARG PROVIDER_AMAZON_VERSION=8.24.0

## General ARGs
ARG AIRFLOW_USER_HOME=/usr/local/airflow
Expand Down
16 changes: 8 additions & 8 deletions docker/config/airflow.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ executor = SequentialExecutor
# This defines the maximum number of task instances that can run concurrently in Airflow
# regardless of scheduler count and worker count. Generally, this value is reflective of
# the number of task instances with the running state in the metadata database.
parallelism = 32
parallelism = 150

# The maximum number of task instances allowed to run concurrently in each DAG. To calculate
# the number of tasks that is running concurrently for a DAG, add up the number of running
Expand All @@ -35,7 +35,7 @@ parallelism = 32
#
# An example scenario when this would be useful is when you want to stop a new dag with an early
# start date from stealing all the executor slots in a cluster.
max_active_tasks_per_dag = 16
max_active_tasks_per_dag = 150

# Are DAGs paused by default at creation
dags_are_paused_at_creation = True
Expand Down Expand Up @@ -157,7 +157,7 @@ sensitive_var_conn_names =
# Task Slot counts for ``default_pool``. This setting would not have any effect in an existing
# deployment where the ``default_pool`` is already created. For existing deployments, users can
# change the number of slots using Webserver, API or the CLI
default_pool_task_slot_count = 10000
default_pool_task_slot_count = 200

[database]
# Collation for ``dag_id``, ``task_id``, ``key`` columns in case they have different encoding.
Expand Down Expand Up @@ -342,7 +342,7 @@ backend =
# See documentation for the secrets backend you are using. JSON is expected.
# Example for AWS Systems Manager ParameterStore:
# ``{{"connections_prefix": "/airflow/connections", "profile_name": "default"}}``
backend_kwargs =
backend_kwargs = '{"connections_lookup_pattern":"^(?!aws_default$).*$"}'

[cli]
# In what way should the cli access the API. The LocalClient will use the
Expand Down Expand Up @@ -457,7 +457,7 @@ reload_on_plugin_change = False
secret_key = $SECRET_KEY

# Number of workers to run the Gunicorn web server
workers = 4
workers = 9

# The worker class gunicorn should use. Choices include
# sync (default), eventlet, gevent
Expand Down Expand Up @@ -815,7 +815,7 @@ catchup_by_default = True
# complexity of query predicate, and/or excessive locking.
# Additionally, you may hit the maximum allowable query length for your db.
# Set this to 0 for no limit (not advised)
max_tis_per_query = 512
max_tis_per_query = 16

# Should the scheduler issue ``SELECT ... FOR UPDATE`` in relevant queries.
# If this is set to False then you should not run more than a single
Expand All @@ -832,11 +832,11 @@ max_dagruns_per_loop_to_schedule = 20
# Should the Task supervisor process perform a "mini scheduler" to attempt to schedule more tasks of the
# same DAG. Leaving this on will mean tasks in the same DAG execute quicker, but might starve out other
# dags in some circumstances
schedule_after_task_execution = True
schedule_after_task_execution = False

# The scheduler can run multiple processes in parallel to parse dags.
# This defines how many processes will run.
parsing_processes = 2
parsing_processes = 7

# One of ``modified_time``, ``random_seeded_by_host`` and ``alphabetical``.
# The scheduler will list and sort the dag files to decide the parsing order.
Expand Down
Loading

0 comments on commit ea75575

Please sign in to comment.