-
Notifications
You must be signed in to change notification settings - Fork 178
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Run some example in Kubernetes execution mode in CI (#1127)
## Description ### Migrate example from [cosmos-example](https://github.com/astronomer/cosmos-example/) The [cosmos-example](https://github.com/astronomer/cosmos-example/) repository currently contains several examples, including those that run in Kubernetes execution mode. This setup has made testing local changes in Kubernetes execution mode challenging and keeping the documentation up-to-date is also not easy. Therefore, it makes sense to migrate the Kubernetes examples from [cosmos-example](https://github.com/astronomer/cosmos-example/) to this repository. This PR resolved the below issue in this regard - Migrate the [jaffle_shop_kubernetes](https://github.com/astronomer/cosmos-example/blob/main/dags/jaffle_shop_kubernetes.py) example DAG to the this repository. - Moved the Dockerfile from [cosmos-example](https://github.com/astronomer/cosmos-example/blob/main/Dockerfile.postgres_profile_docker_k8s) to this repository to build the image with the necessary DAGs and DBT projects I also adjusted both the example DAG and Dockerfile to work within this repository. ### Automate running locally I introduce some scripts to make running Kubernetes DAG easy. **postgres-deployment.yaml:** Kubernetes resource file for spinning up PostgreSQL and creating Kubernetes secrets. **integration-kubernetes.sh:** Runs the Kubernetes DAG using pytest. **kubernetes-setup.sh:** - Builds the Docker image with the Jaffle Shop dbt project and DAG, and loads the Docker image into the local registry. - Creates Kubernetes resources such as PostgreSQL deployment, service, and secret. **Run DAG locally** Prerequisites: - Docker Desktop - KinD (Kubernetes in Docker) - kubectl Steps: 1. Create cluster: `kind create cluster` 2. Create Resource: `scripts/test/kubernetes-setup.sh` (This will set up PostgreSQL and load the DBT project into the local registry) 3. Run DAG: `cd dev && scripts/test/integration-kubernetes.sh` this will execute this DAG with a pytest you can also run directly with airflow command given that project is installed in your virtual env ``` time AIRFLOW__COSMOS__PROPAGATE_LOGS=0 AIRFLOW__COSMOS__ENABLE_CACHE=1 AIRFLOW__COSMOS__CACHE_DIR=/tmp/ AIRFLOW_CONN_EXAMPLE_ CONN="postgres://postgres:[email protected]:5432/postgres" PYTHONPATH=`pwd` AIRFLOW_HOME=`pwd` AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=20000 AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT=20000 airflow dags test jaffle_shop_kubernetes `date -Iseconds` ``` ### Run jaffle_shop_kubernetes in CI To avoid regression we have automated running the jaffle_shop_kubernetes in CI - Set up the GitHub Actions infrastructure to run DAGs using Kubernetes execution mode - Use [container-tools/kind-action@v1](https://github.com/container-tools/kind-action) to create a KinD cluster. - Used the bash script to streamline the creation of Kubernetes resources, build and load the image into a local registry, and execute tests. - At the moment I'm running the pytest from virtual env ### Documentation changes Given that the DAG [jaffle_shop_kubernetes](https://github.com/astronomer/cosmos-example/blob/main/dags/jaffle_shop_kubernetes.py) is now part of this repository, I have automated the example rendering for Kubernetes execution mode. This ensures that we avoid displaying outdated example code. https://astronomer.github.io/astronomer-cosmos/getting_started/execution-modes.html#kubernetes <img width="822" alt="Screenshot 2024-08-15 at 8 03 59 PM" src="https://github.com/user-attachments/assets/1eadad09-9b7c-43e1-bcd8-b08dd21e3878"> https://astronomer.github.io/astronomer-cosmos/getting_started/kubernetes.html#kubernetes <img width="812" alt="Screenshot 2024-08-15 at 8 04 22 PM" src="https://github.com/user-attachments/assets/7161fa9b-e5c1-44d8-8702-b2c583dee236"> ### Future work - Use the hatch target to run the test. I have introduced the hatch target to run the Kubernetes example with hatch, but it's currently not working due to a mismatch between the local and container DBT project paths. This requires a bit more work. - Remove the virtual environment step (Install packages and dependencies) in the CI configuration for Run-Kubernetes-Tests and use hatch instead. - Update the profile YAML to use environment variables for the port, as it is currently hardcoded. - Remove the host from the Kubernetes secret and replace it with the username and make corresponding change in DAG - Currently, we need to export both POSTGRES_DATABASE and POSTGRES_DB in the Dockerfile because both are used in the project. To ensure consistency, avoid exporting both and instead make the environment variables consistent across the repository - Not a big deal in this context, but we have some hardcoded values for secrets. It would be better to parameterize them GH issue for future improvement: #1160 ### Example CI Run - https://github.com/astronomer/astronomer-cosmos/actions/runs/10405590862 ## Related Issue(s) closes: #535 ## Breaking Change? No <!-- If this introduces a breaking change, specify that here. --> ## Checklist - [x] I have made corresponding changes to the documentation (if required) - [x] I have added tests that prove my fix is effective or that my feature works
- Loading branch information
1 parent
89f5999
commit e1ff924
Showing
21 changed files
with
379 additions
and
55 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -163,7 +163,6 @@ jobs: | |
POSTGRES_DB: postgres | ||
POSTGRES_SCHEMA: public | ||
POSTGRES_PORT: 5432 | ||
SOURCE_RENDERING_BEHAVIOR: all | ||
|
||
- name: Upload coverage to Github | ||
uses: actions/upload-artifact@v2 | ||
|
@@ -235,7 +234,6 @@ jobs: | |
POSTGRES_DB: postgres | ||
POSTGRES_SCHEMA: public | ||
POSTGRES_PORT: 5432 | ||
SOURCE_RENDERING_BEHAVIOR: all | ||
|
||
- name: Upload coverage to Github | ||
uses: actions/upload-artifact@v2 | ||
|
@@ -379,7 +377,6 @@ jobs: | |
POSTGRES_DB: postgres | ||
POSTGRES_SCHEMA: public | ||
POSTGRES_PORT: 5432 | ||
SOURCE_RENDERING_BEHAVIOR: all | ||
|
||
- name: Upload coverage to Github | ||
uses: actions/upload-artifact@v2 | ||
|
@@ -461,12 +458,82 @@ jobs: | |
AIRFLOW_CONN_EXAMPLE_CONN: postgres://postgres:[email protected]:5432/postgres | ||
PYTHONPATH: /home/runner/work/astronomer-cosmos/astronomer-cosmos/:$PYTHONPATH | ||
|
||
Run-Kubernetes-Tests: | ||
needs: Authorize | ||
runs-on: ubuntu-latest | ||
strategy: | ||
matrix: | ||
python-version: [ "3.11" ] | ||
airflow-version: [ "2.9" ] | ||
steps: | ||
- uses: actions/checkout@v3 | ||
with: | ||
ref: ${{ github.event.pull_request.head.sha || github.ref }} | ||
- uses: actions/cache@v3 | ||
with: | ||
path: | | ||
~/.cache/pip | ||
.local/share/hatch/ | ||
key: coverage-integration-kubernetes-test-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.airflow-version }}-${{ hashFiles('pyproject.toml') }}-${{ hashFiles('cosmos/__init__.py') }} | ||
|
||
- name: Set up Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
|
||
- name: Create KinD cluster | ||
uses: container-tools/kind-action@v1 | ||
|
||
- name: Install packages and dependencies | ||
run: | | ||
python -m venv venv | ||
source venv/bin/activate | ||
pip install --upgrade pip | ||
pip install -e ".[tests]" | ||
pip install apache-airflow-providers-cncf-kubernetes | ||
pip install dbt-postgres==1.8.2 psycopg2==2.9.3 pytz | ||
pip install apache-airflow==${{ matrix.airflow-version }} | ||
# hatch -e tests.py${{ matrix.python-version }}-${{ matrix.airflow-version }} run pip freeze | ||
- name: Run kubernetes tests | ||
run: | | ||
source venv/bin/activate | ||
sh ./scripts/test/kubernetes-setup.sh | ||
cd dev && sh ../scripts/test/integration-kubernetes.sh | ||
# hatch run tests.py${{ matrix.python-version }}-${{ matrix.airflow-version }}:test-kubernetes | ||
env: | ||
AIRFLOW_HOME: /home/runner/work/astronomer-cosmos/astronomer-cosmos/ | ||
AIRFLOW_CONN_EXAMPLE_CONN: postgres://postgres:[email protected]:5432/postgres | ||
AIRFLOW_CONN_AWS_S3_CONN: ${{ secrets.AIRFLOW_CONN_AWS_S3_CONN }} | ||
AIRFLOW_CONN_GCP_GS_CONN: ${{ secrets.AIRFLOW_CONN_GCP_GS_CONN }} | ||
AIRFLOW_CONN_AZURE_ABFS_CONN: ${{ secrets.AIRFLOW_CONN_AZURE_ABFS_CONN }} | ||
AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT: 90.0 | ||
PYTHONPATH: /home/runner/work/astronomer-cosmos/astronomer-cosmos/:$PYTHONPATH | ||
COSMOS_CONN_POSTGRES_PASSWORD: ${{ secrets.COSMOS_CONN_POSTGRES_PASSWORD }} | ||
DATABRICKS_CLUSTER_ID: mock | ||
DATABRICKS_HOST: mock | ||
DATABRICKS_WAREHOUSE_ID: mock | ||
DATABRICKS_TOKEN: mock | ||
POSTGRES_HOST: localhost | ||
POSTGRES_USER: postgres | ||
POSTGRES_PASSWORD: postgres | ||
POSTGRES_DB: postgres | ||
POSTGRES_SCHEMA: public | ||
POSTGRES_PORT: 5432 | ||
|
||
- name: Upload coverage to Github | ||
uses: actions/upload-artifact@v2 | ||
with: | ||
name: coverage-integration-kubernetes-test-${{ matrix.python-version }}-${{ matrix.airflow-version }} | ||
path: .coverage | ||
|
||
Code-Coverage: | ||
if: github.event.action != 'labeled' | ||
needs: | ||
- Run-Unit-Tests | ||
- Run-Integration-Tests | ||
- Run-Integration-Tests-Expensive | ||
- Run-Kubernetes-Tests | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v3 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
FROM python:3.11 | ||
|
||
RUN pip install dbt-postgres==1.8.2 psycopg2==2.9.3 pytz | ||
|
||
ENV POSTGRES_DATABASE=postgres | ||
ENV POSTGRES_DB=postgres | ||
ENV POSTGRES_HOST=postgres.default.svc.cluster.local | ||
ENV POSTGRES_PASSWORD=postgres | ||
ENV POSTGRES_PORT=5432 | ||
ENV POSTGRES_SCHEMA=public | ||
ENV POSTGRES_USER=postgres | ||
|
||
RUN mkdir /root/.dbt | ||
COPY dags/dbt/jaffle_shop/profiles.yml /root/.dbt/profiles.yml | ||
|
||
RUN mkdir dags | ||
COPY dags dags | ||
RUN rm dags/dbt/jaffle_shop/packages.yml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
""" | ||
## Jaffle Shop DAG | ||
[Jaffle Shop](https://github.com/dbt-labs/jaffle_shop) is a fictional eCommerce store. This dbt project originates from | ||
dbt labs as an example project with dummy data to demonstrate a working dbt core project. This DAG uses the cosmos dbt | ||
parser to generate an Airflow TaskGroup from the dbt project folder. | ||
The step-by-step to run this DAG are described in: | ||
https://astronomer.github.io/astronomer-cosmos/getting_started/kubernetes.html#kubernetes | ||
""" | ||
|
||
from airflow import DAG | ||
from airflow.providers.cncf.kubernetes.secret import Secret | ||
from pendulum import datetime | ||
|
||
from cosmos import ( | ||
DbtSeedKubernetesOperator, | ||
DbtTaskGroup, | ||
ExecutionConfig, | ||
ExecutionMode, | ||
ProfileConfig, | ||
ProjectConfig, | ||
) | ||
from cosmos.profiles import PostgresUserPasswordProfileMapping | ||
|
||
DBT_IMAGE = "dbt-jaffle-shop:1.0.0" | ||
|
||
project_seeds = [{"project": "jaffle_shop", "seeds": ["raw_customers", "raw_payments", "raw_orders"]}] | ||
|
||
postgres_password_secret = Secret( | ||
deploy_type="env", | ||
deploy_target="POSTGRES_PASSWORD", | ||
secret="postgres-secrets", | ||
key="password", | ||
) | ||
|
||
postgres_host_secret = Secret( | ||
deploy_type="env", | ||
deploy_target="POSTGRES_HOST", | ||
secret="postgres-secrets", | ||
key="host", | ||
) | ||
|
||
with DAG( | ||
dag_id="jaffle_shop_kubernetes", | ||
start_date=datetime(2022, 11, 27), | ||
doc_md=__doc__, | ||
catchup=False, | ||
) as dag: | ||
# [START kubernetes_seed_example] | ||
load_seeds = DbtSeedKubernetesOperator( | ||
task_id="load_seeds", | ||
project_dir="dags/dbt/jaffle_shop", | ||
get_logs=True, | ||
schema="public", | ||
image=DBT_IMAGE, | ||
is_delete_operator_pod=False, | ||
secrets=[postgres_password_secret, postgres_host_secret], | ||
profile_config=ProfileConfig( | ||
profile_name="postgres_profile", | ||
target_name="dev", | ||
profile_mapping=PostgresUserPasswordProfileMapping( | ||
conn_id="postgres_default", | ||
profile_args={ | ||
"schema": "public", | ||
}, | ||
), | ||
), | ||
) | ||
# [END kubernetes_seed_example] | ||
|
||
# [START kubernetes_tg_example] | ||
run_models = DbtTaskGroup( | ||
profile_config=ProfileConfig( | ||
profile_name="postgres_profile", | ||
target_name="dev", | ||
profile_mapping=PostgresUserPasswordProfileMapping( | ||
conn_id="postgres_default", | ||
profile_args={ | ||
"schema": "public", | ||
}, | ||
), | ||
), | ||
project_config=ProjectConfig(dbt_project_path="dags/dbt/jaffle_shop"), | ||
execution_config=ExecutionConfig( | ||
execution_mode=ExecutionMode.KUBERNETES, | ||
), | ||
operator_args={ | ||
"image": DBT_IMAGE, | ||
"get_logs": True, | ||
"is_delete_operator_pod": False, | ||
"secrets": [postgres_password_secret, postgres_host_secret], | ||
}, | ||
) | ||
# [END kubernetes_tg_example] | ||
|
||
load_seeds >> run_models |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
#!/bin/bash | ||
|
||
set -x | ||
set -e | ||
|
||
# Reset the Airflow database to its initial state | ||
airflow db reset -y | ||
|
||
# Run tests using pytest | ||
pytest -vv \ | ||
--cov=cosmos \ | ||
--cov-report=term-missing \ | ||
--cov-report=xml \ | ||
--durations=0 \ | ||
-m integration \ | ||
../tests/test_example_k8s_dags.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
#!/bin/bash | ||
|
||
# Print each command before executing it | ||
# Exit the script immediately if any command exits with a non-zero status (for debugging purposes) | ||
set -x | ||
set -e | ||
|
||
# Create a Kubernetes secret named 'postgres-secrets' with the specified literals for host and password | ||
kubectl create secret generic postgres-secrets \ | ||
--from-literal=host=postgres-postgresql.default.svc.cluster.local \ | ||
--from-literal=password=postgres | ||
|
||
# Apply the PostgreSQL deployment configuration from the specified YAML file | ||
kubectl apply -f scripts/test/postgres-deployment.yaml | ||
|
||
# Build the Docker image with tag 'dbt-jaffle-shop:1.0.0' using the specified Dockerfile | ||
cd dev && docker build --progress=plain --no-cache -t dbt-jaffle-shop:1.0.0 -f Dockerfile.postgres_profile_docker_k8s . | ||
|
||
# Load the Docker image into the local KIND cluster | ||
kind load docker-image dbt-jaffle-shop:1.0.0 | ||
|
||
# Retrieve the name of the PostgreSQL pod using the label selector 'app=postgres' | ||
# The output is filtered to get the first pod's name | ||
POD_NAME=$(kubectl get pods -n default -l app=postgres -o jsonpath='{.items[0].metadata.name}') | ||
|
||
# Print the name of the PostgreSQL pod | ||
echo "$POD_NAME" | ||
|
||
# Forward port 5432 from the PostgreSQL pod to the local machine's port 5432 | ||
# This allows local access to the PostgreSQL instance running in the pod | ||
kubectl port-forward --namespace default "$POD_NAME" 5432:5432 & | ||
|
||
# List all pods in the default namespace to verify the status of pods | ||
kubectl get pod |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.