Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.2.5 #709

Merged
merged 11 commits into from
Nov 23, 2023
23 changes: 20 additions & 3 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,24 @@
Changelog

=========

1.2.5 (2023-11-23)
------------------

Bug fixes

* Fix running models that use alias while supporting dbt versions by @binhnq94 in #662
* Make ``profiles_yml_path`` optional for ``ExecutionMode.DOCKER`` and ``KUBERNETES`` by @MrBones757 in #681
* Prevent overriding dbt profile fields with profile args of "type" or "method" by @jbandoro in #702
* Fix ``LoadMode.DBT_LS`` fail when dbt outputs ``WarnErrorOptions`` by @adammarples in #692
* Add support for env vars in ``RenderConfig`` for dbt ls parsing by @jbandoro in #690
* Add support for Kubernetes ``on_warning_callback`` by @david-mag in #673
* Fix ExecutionConfig.dbt_executable_path to use ``default_factory`` by @jbandoro in #678

Others

* Docs fix: example DAG in the README and docs/index by @tatiana in #705
* Docs improvement: highlight DAG examples in README by @iancmoritz and @jlaneve in #695

1.2.4 (2023-11-14)
------------------

Expand All @@ -14,8 +31,8 @@ Bug fixes

Others

* Docs fix: add execution config to MWAA code example by @ugmuka in #674

* Docs: add execution config to MWAA code example by @ugmuka in #674
* Docs: highlight DAG examples in docs by @iancmoritz and @jlaneve in #695

1.2.3 (2023-11-09)
------------------
Expand Down
45 changes: 9 additions & 36 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,64 +31,37 @@ Run your dbt Core projects as `Apache Airflow <https://airflow.apache.org/>`_ DA
Quickstart
__________

Check out the Quickstart guide on our `docs <https://astronomer.github.io/astronomer-cosmos/#quickstart>`_.
Check out the Quickstart guide on our `docs <https://astronomer.github.io/astronomer-cosmos/#quickstart>`_. See more examples at `/dev/dags <https://github.com/astronomer/astronomer-cosmos/tree/main/dev/dags>`_ and at the `cosmos-demo repo <https://github.com/astronomer/cosmos-demo>`_.


Example Usage
___________________

You can render an Airflow Task Group using the ``DbtTaskGroup`` class. Here's an example with the `jaffle_shop project <https://github.com/dbt-labs/jaffle_shop>`_:
You can render a Cosmos Airflow DAG using the ``DbtDag`` class. Here's an example with the `jaffle_shop project <https://github.com/dbt-labs/jaffle_shop>`_:

..
This renders on Github but not Sphinx:

.. code-block:: python
https://github.com/astronomer/astronomer-cosmos/blob/24aa38e528e299ef51ca6baf32f5a6185887d432/dev/dags/basic_cosmos_dag.py#L1-L42

from pendulum import datetime
This will generate an Airflow DAG that looks like this:

from airflow import DAG
from airflow.operators.empty import EmptyOperator
from cosmos import DbtTaskGroup, ProfileConfig, ProjectConfig
from cosmos.profiles import PostgresUserPasswordProfileMapping
.. figure:: /docs/_static/jaffle_shop_dag.png

profile_config = ProfileConfig(
profile_name="default",
target_name="dev",
profile_mapping=PostgresUserPasswordProfileMapping(
conn_id="airflow_db",
profile_args={"schema": "public"},
),
)

with DAG(
dag_id="extract_dag",
start_date=datetime(2022, 11, 27),
schedule_interval="@daily",
):
e1 = EmptyOperator(task_id="pre_dbt")

dbt_tg = DbtTaskGroup(
project_config=ProjectConfig("jaffle_shop"),
profile_config=profile_config,
)

e2 = EmptyOperator(task_id="post_dbt")

e1 >> dbt_tg >> e2

This will generate an Airflow Task Group that looks like this:

.. figure:: /docs/_static/jaffle_shop_task_group.png

Community
_________
- Join us on the Airflow `Slack <https://join.slack.com/t/apache-airflow/shared_invite/zt-1zy8e8h85-es~fn19iMzUmkhPwnyRT6Q>`_ at #airflow-dbt


Changelog
_________

We follow `Semantic Versioning <https://semver.org/>`_ for releases.
Check `CHANGELOG.rst <https://github.com/astronomer/astronomer-cosmos/blob/main/CHANGELOG.rst>`_
for the latest changes.


Contributing Guide
__________________

Expand Down
2 changes: 1 addition & 1 deletion cosmos/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
Contains dags, task groups, and operators.
"""
__version__ = "1.2.4"
__version__ = "1.2.5"

from cosmos.airflow.dag import DbtDag
from cosmos.airflow.task_group import DbtTaskGroup
Expand Down
12 changes: 6 additions & 6 deletions cosmos/airflow/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,11 +83,11 @@ def create_test_task_metadata(
task_args["indirect_selection"] = test_indirect_selection.value
if node is not None:
if node.resource_type == DbtResourceType.MODEL:
task_args["models"] = node.name
task_args["models"] = node.resource_name
elif node.resource_type == DbtResourceType.SOURCE:
task_args["select"] = f"source:{node.unique_id[len('source.'):]}"
task_args["select"] = f"source:{node.resource_name}"
else: # tested with node.resource_type == DbtResourceType.SEED or DbtResourceType.SNAPSHOT
task_args["select"] = node.name
task_args["select"] = node.resource_name
return TaskMetadata(
id=test_task_name,
operator_class=calculate_operator_class(
Expand All @@ -108,8 +108,8 @@ def create_task_metadata(
:param execution_mode: Where Cosmos should run each dbt task (e.g. ExecutionMode.LOCAL, ExecutionMode.KUBERNETES).
Default is ExecutionMode.LOCAL.
:param args: Arguments to be used to instantiate an Airflow Task
:param use_name_as_task_id_prefix: If resource_type is DbtResourceType.MODEL, it determines whether
using name as task id prefix or not. If it is True task_id = <node.name>_run, else task_id=run.
:param use_task_group: It determines whether to use the name as a prefix for the task id or not.
If it is False, then use the name as a prefix for the task id, otherwise do not.
:returns: The metadata necessary to instantiate the source dbt node as an Airflow task.
"""
dbt_resource_to_class = {
Expand All @@ -118,7 +118,7 @@ def create_task_metadata(
DbtResourceType.SEED: "DbtSeed",
DbtResourceType.TEST: "DbtTest",
}
args = {**args, **{"models": node.name}}
args = {**args, **{"models": node.resource_name}}

if DbtResourceType(node.resource_type) in DEFAULT_DBT_RESOURCES and node.resource_type in dbt_resource_to_class:
if node.resource_type == DbtResourceType.MODEL:
Expand Down
20 changes: 13 additions & 7 deletions cosmos/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ class RenderConfig:
:param exclude: A list of dbt exclude arguments (e.g. 'tag:nightly')
:param dbt_deps: Configure to run dbt deps when using dbt ls for dag parsing
:param node_converters: a dictionary mapping a ``DbtResourceType`` into a callable. Users can control how to render dbt nodes in Airflow. Only supported when using ``load_method=LoadMode.DBT_MANIFEST`` or ``LoadMode.DBT_LS``.
:param dbt_executable_path: The path to the dbt executable for dag generation. Defaults to dbt if available on the path. Mutually Exclusive with ProjectConfig.dbt_project_path
:param dbt_project_path Configures the DBT project location accessible on the airflow controller for DAG rendering - Required when using ``load_method=LoadMode.DBT_LS`` or ``load_method=LoadMode.CUSTOM``
:param dbt_executable_path: The path to the dbt executable for dag generation. Defaults to dbt if available on the path.
:param dbt_project_path Configures the DBT project location accessible on the airflow controller for DAG rendering. Mutually Exclusive with ProjectConfig.dbt_project_path. Required when using ``load_method=LoadMode.DBT_LS`` or ``load_method=LoadMode.CUSTOM``.
"""

emit_datasets: bool = True
Expand Down Expand Up @@ -195,15 +195,21 @@ class ProfileConfig:
profile_mapping: BaseProfileMapping | None = None

def __post_init__(self) -> None:
"Validates that we have enough information to render a profile."
# if using a user-supplied profiles.yml, validate that it exists
if self.profiles_yml_filepath and not Path(self.profiles_yml_filepath).exists():
raise CosmosValueError(f"The file {self.profiles_yml_filepath} does not exist.")
self.validate_profile()

def validate_profile(self) -> None:
"Validates that we have enough information to render a profile."
if not self.profiles_yml_filepath and not self.profile_mapping:
raise CosmosValueError("Either profiles_yml_filepath or profile_mapping must be set to render a profile")
if self.profiles_yml_filepath and self.profile_mapping:
raise CosmosValueError(
"Both profiles_yml_filepath and profile_mapping are defined and are mutually exclusive. Ensure only one of these is defined."
)

def validate_profiles_yml(self) -> None:
"Validates a user-supplied profiles.yml is present"
if self.profiles_yml_filepath and not Path(self.profiles_yml_filepath).exists():
raise CosmosValueError(f"The file {self.profiles_yml_filepath} does not exist.")

@contextlib.contextmanager
def ensure_profile(
Expand Down Expand Up @@ -258,7 +264,7 @@ class ExecutionConfig:

execution_mode: ExecutionMode = ExecutionMode.LOCAL
test_indirect_selection: TestIndirectSelection = TestIndirectSelection.EAGER
dbt_executable_path: str | Path = get_system_dbt()
dbt_executable_path: str | Path = field(default_factory=get_system_dbt)

dbt_project_path: InitVar[str | Path | None] = None

Expand Down
28 changes: 20 additions & 8 deletions cosmos/converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from airflow.utils.task_group import TaskGroup

from cosmos.airflow.graph import build_airflow_graph
from cosmos.constants import ExecutionMode
from cosmos.dbt.graph import DbtGraph
from cosmos.dbt.selector import retrieve_by_label
from cosmos.config import ProjectConfig, ExecutionConfig, RenderConfig, ProfileConfig
Expand Down Expand Up @@ -49,16 +50,21 @@ def airflow_kwargs(**kwargs: dict[str, Any]) -> dict[str, Any]:


def validate_arguments(
select: list[str], exclude: list[str], profile_args: dict[str, Any], task_args: dict[str, Any]
select: list[str],
exclude: list[str],
profile_config: ProfileConfig,
task_args: dict[str, Any],
execution_mode: ExecutionMode,
) -> None:
"""
Validate that mutually exclusive selectors filters have not been given.
Validate deprecated arguments.

:param select: A list of dbt select arguments (e.g. 'config.materialized:incremental')
:param exclude: A list of dbt exclude arguments (e.g. 'tag:nightly')
:param profile_args: Arguments to pass to the dbt profile
:param profile_config: ProfileConfig Object
:param task_args: Arguments to be used to instantiate an Airflow Task
:param execution_mode: the current execution mode
"""
for field in ("tags", "paths"):
select_items = retrieve_by_label(select, field)
Expand All @@ -69,8 +75,12 @@ def validate_arguments(

# if task_args has a schema, add it to the profile args and add a deprecated warning
if "schema" in task_args:
profile_args["schema"] = task_args["schema"]
logger.warning("Specifying a schema in the `task_args` is deprecated. Please use the `profile_args` instead.")
if profile_config.profile_mapping:
profile_config.profile_mapping.profile_args["schema"] = task_args["schema"]

if execution_mode in [ExecutionMode.LOCAL, ExecutionMode.VIRTUALENV]:
profile_config.validate_profiles_yml()


class DbtToAirflowConverter:
Expand Down Expand Up @@ -139,10 +149,6 @@ def __init__(
"RenderConfig.dbt_project_path is required for rendering an airflow DAG from a DBT Graph if no manifest is provided."
)

profile_args = {}
if profile_config.profile_mapping:
profile_args = profile_config.profile_mapping.profile_args

if not operator_args:
operator_args = {}

Expand Down Expand Up @@ -174,7 +180,13 @@ def __init__(
if execution_config.dbt_executable_path:
task_args["dbt_executable_path"] = execution_config.dbt_executable_path

validate_arguments(render_config.select, render_config.exclude, profile_args, task_args)
validate_arguments(
render_config.select,
render_config.exclude,
profile_config,
task_args,
execution_mode=execution_config.execution_mode,
)

build_airflow_graph(
nodes=dbt_graph.filtered_nodes,
Expand Down
31 changes: 19 additions & 12 deletions cosmos/dbt/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@
Metadata related to a dbt node (e.g. model, seed, snapshot).
"""

name: str
unique_id: str
resource_type: DbtResourceType
depends_on: list[str]
Expand All @@ -51,6 +50,23 @@
config: dict[str, Any] = field(default_factory=lambda: {})
has_test: bool = False

@property
def resource_name(self) -> str:
"""
Use this property to retrieve the resource name for command generation, for instance: ["dbt", "run", "--models", f"{resource_name}"].
The unique_id format is defined as [<resource_type>.<package>.<resource_name>](https://docs.getdbt.com/reference/artifacts/manifest-json#resource-details).
For a special case like a versioned model, the unique_id follows this pattern: [model.<package>.<resource_name>.<version>](https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/contracts/graph/node_args.py#L26C3-L31)
"""
return self.unique_id.split(".", 2)[2]

@property
def name(self) -> str:
"""
Use this property as the task name or task group name.
Replace period (.) with underscore (_) due to versioned models.
"""
return self.resource_name.replace(".", "_")


def run_command(command: list[str], tmp_dir: Path, env_vars: dict[str, str]) -> str:
"""Run a command in a subprocess, returning the stdout."""
Expand All @@ -72,7 +88,7 @@
"Unable to run dbt ls command due to missing dbt_packages. Set RenderConfig.dbt_deps=True."
)

if returncode or "Error" in stdout:
if returncode or "Error" in stdout.replace("WarnErrorOptions", ""):
details = stderr or stdout
raise CosmosLoadDbtException(f"Unable to run {command} due to the error:\n{details}")

Expand All @@ -89,7 +105,6 @@
logger.debug("Skipped dbt ls line: %s", line)
else:
node = DbtNode(
name=node_dict.get("alias", node_dict["name"]),
unique_id=node_dict["unique_id"],
resource_type=DbtResourceType(node_dict["resource_type"]),
depends_on=node_dict.get("depends_on", {}).get("nodes", []),
Expand Down Expand Up @@ -195,9 +210,6 @@
This is the most accurate way of loading `dbt` projects and filtering them out, since it uses the `dbt` command
line for both parsing and filtering the nodes.

Noted that if dbt project contains versioned models, need to use dbt>=1.6.0 instead. Because, as dbt<1.6.0,
dbt cli doesn't support select a specific versioned models as stg_customers_v1, customers_v1, ...

Updates in-place:
* self.nodes
* self.filtered_nodes
Expand All @@ -208,7 +220,7 @@

logger.info(f"Trying to parse the dbt project in `{self.render_config.project_path}` using dbt ls...")
if not self.render_config.project_path or not self.execution_config.project_path:
raise CosmosLoadDbtException(

Check warning on line 223 in cosmos/dbt/graph.py

View check run for this annotation

Codecov / codecov/patch

cosmos/dbt/graph.py#L223

Added line #L223 was not covered by tests
"Unable to load project via dbt ls without RenderConfig.dbt_project_path and ExecutionConfig.dbt_project_path"
)

Expand Down Expand Up @@ -291,8 +303,7 @@
for model_name, model in models:
config = {item.split(":")[0]: item.split(":")[-1] for item in model.config.config_selectors}
node = DbtNode(
name=model_name,
unique_id=model_name,
unique_id=f"{model.type.value}.{self.project.project_name}.{model_name}",
resource_type=DbtResourceType(model.type.value),
depends_on=list(model.config.upstream_models),
file_path=Path(
Expand Down Expand Up @@ -325,9 +336,6 @@
However, since the Manifest does not represent filters, it relies on the Custom Cosmos implementation
to filter out the nodes relevant to the user (based on self.exclude and self.select).

Noted that if dbt project contains versioned models, need to use dbt>=1.6.0 instead. Because, as dbt<1.6.0,
dbt cli doesn't support select a specific versioned models as stg_customers_v1, customers_v1, ...

Updates in-place:
* self.nodes
* self.filtered_nodes
Expand All @@ -338,7 +346,7 @@
raise CosmosLoadDbtException(f"Unable to load manifest using {self.project.manifest_path}")

if not self.execution_config.project_path:
raise CosmosLoadDbtException("Unable to load manifest without ExecutionConfig.dbt_project_path")

Check warning on line 349 in cosmos/dbt/graph.py

View check run for this annotation

Codecov / codecov/patch

cosmos/dbt/graph.py#L349

Added line #L349 was not covered by tests

nodes = {}
with open(self.project.manifest_path) as fp: # type: ignore[arg-type]
Expand All @@ -347,7 +355,6 @@
resources = {**manifest.get("nodes", {}), **manifest.get("sources", {}), **manifest.get("exposures", {})}
for unique_id, node_dict in resources.items():
node = DbtNode(
name=node_dict.get("alias", node_dict["name"]),
unique_id=unique_id,
resource_type=DbtResourceType(node_dict["resource_type"]),
depends_on=node_dict.get("depends_on", {}).get("nodes", []),
Expand Down
Loading
Loading