Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change metadata_vars if not to if ... is None #37

Merged
merged 1 commit into from
Jan 24, 2024

Conversation

truls-p
Copy link
Contributor

@truls-p truls-p commented Jan 23, 2024

(Opened this PR instead of dbt-labs/dbt-core#9376)

Both None and {} evaluate to True when using if not metadata_vars. This caused us to loop over the environment variables unnecessarily. This provides a significant performance increase for large projects.

resolves #

Relates to dbt-labs/dbt-core#6073

Problem

Poor performance with large models.

I was trying to profile a large model and noticed get_metadata_vars is called many times and we were spending quite a bit of time in in a dictcomp. I then realised that whenever we do not have any environment variables with the "DBT_ENV_CUSTOM_ENV_" prefix the metadata_vars dictionary will be set to {} the first time get_metadata_vars is called. On subsequent calls, we continue to loop over the environment variables due to the if condition, if not metadata_vars, which evaluates to True for both None and {}.

Solution

Solution was to change if not metadata_vars to if metadata_vars is None. This ensures we only loop through the environment variables looking for the prefix once.

Here is a test that could be run before dbt-common was split off to its own repo and the pytest-benchmark results on my machine. Note that this test may take up to 30 min to complete. Please let me know if you think a shorter test (perhaps one that just ensures we do not unnecessarily loop through os.environ multiple times) should be included in this PR.

import pytest
import os

from dbt.tests.util import run_dbt
from dbt.tests.fixtures.project import write_project_files

import dbt.common.events.functions

from tests.fixtures.dbt_integration_project import dbt_integration_project  # noqa: F401
from tests.functional.graph_selection.fixtures import SelectionFixtures

NUM_TABLES = 500
metadata_vars = None
_METADATA_ENV_PREFIX = "DBT_ENV_CUSTOM_ENV_"


def run_schema_and_assert(project, include, exclude, expected_tests):
    # deps must run before seed
    run_dbt(["deps"])
    run_dbt(["seed"])
    results = run_dbt(["build", "--exclude", "never_selected"])
    assert len(results) == 18 + NUM_TABLES

    test_args = ["test"]
    if include:
        test_args += ["--select", include]
    if exclude:
        test_args += ["--exclude", exclude]
    test_results = run_dbt(test_args)

    ran_tests = sorted([test.node.name for test in test_results])
    expected_sorted = sorted(expected_tests)

    assert ran_tests == expected_sorted

def get_metadata_vars_is():
    global metadata_vars
    if metadata_vars is None:
        metadata_vars = {
            k[len(_METADATA_ENV_PREFIX) :]: v
            for k, v in os.environ.items()
            if k.startswith(_METADATA_ENV_PREFIX)
        }
    return metadata_vars

def get_metadata_vars_not():
    global metadata_vars
    if not metadata_vars:
        metadata_vars = {
            k[len(_METADATA_ENV_PREFIX) :]: v
            for k, v in os.environ.items()
            if k.startswith(_METADATA_ENV_PREFIX)
        }
    return metadata_vars

class TestPerformanceManyTables(SelectionFixtures):
    @pytest.fixture(scope="class", autouse=True)
    def setUp(self, project_root, dbt_integration_project):  # noqa: F811

        sql_string = "select * from {{ this.schema }}.seed"
        for i in range(NUM_TABLES):
            dbt_integration_project["models"][f"{i}.sql"] = sql_string
        write_project_files(project_root, "dbt_integration_project", dbt_integration_project)

    @pytest.fixture(scope="class")
    def packages(self):
        return {"packages": [{"local": "dbt_integration_project"}]}


    @pytest.mark.skipif(os.environ.get('DBT_PERF_TEST') is None, reason="DBT_PERF_TEST not set")
    def test_performance_many_tables_specify_tag_not(self, project, benchmark, monkeypatch):
        monkeypatch.setattr(dbt.common.events.functions, 'get_metadata_vars', get_metadata_vars_not)
        benchmark.pedantic(run_schema_and_assert, args=(project, "tag:bi", None, ["unique_users_id", "unique_users_rollup_gender"]), iterations=5)


    @pytest.mark.skipif(os.environ.get('DBT_PERF_TEST') is None, reason="DBT_PERF_TEST not set")
    def test_performance_many_tables_specify_tag_is(self, project, benchmark, monkeypatch):
        monkeypatch.setattr(dbt.common.events.functions, 'get_metadata_vars', get_metadata_vars_is)
        benchmark.pedantic(run_schema_and_assert, args=(project, "tag:bi", None, ["unique_users_id", "unique_users_rollup_gender"]), iterations=5)

dbt_is_not_perf

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

Both None and {} evaluates to True when using if not metadata_vars. This caused us to loop over the environment variables unnecessarily. This provides a significant performance increase for large projects.
@truls-p truls-p marked this pull request as ready for review January 23, 2024 19:52
@peterallenwebb peterallenwebb merged commit 5ce19dd into dbt-labs:main Jan 24, 2024
6 checks passed
@peterallenwebb
Copy link
Contributor

@truls-p Thank you for you contribution to dbt-core!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants