Defining vars, folder-level configs outside dbt_project.yml #2955

jtcohen6 · 2020-12-15T14:45:19Z

Describe the feature

I’d like to use project level variables more, but I’m concerned about bloat to my already large dbt-project.yml file. I think it would be helpful if I could create a variables.yml file that could be imported in dbt-project.yml . And for that matter, the same could be done for other configurations in the dbt_project.yml file. I think having the ability to separate configurations into different files might make for improved modularity / separation of concerns (particularly for large projects), not to mention fewer merge conflicts. CC @jrandrews

Describe alternatives you've considered

We're already thinking of enabling some configs in resource-YAML files (Set configs in schema.yml files #2401), but these would be at the level of the individual resource (model/seed/snapshot/etc) only
The dbt_project.yml gets really really big??

Additional context

I don't think this has any correspondence to v1.0. It's a nice thing to have, and we can could do it before, after, any time without it being a breaking change in any way.

Who will this benefit?

Developers and maintainers of increasingly big dbt projects

The text was updated successfully, but these errors were encountered:

codigo-ergo-sum · 2020-12-16T04:02:05Z

Some additional thoughts/failure modes/concerns to think about with this:

In large dbt projects, one problem with project-wide variables is the potential for developers to "step" on each other by editing or overwriting or conflicting with each other's var declarations. If devs were being meticulous in looking at the project-wide relevance of a given var, then this might happen less or not at all, but that is unfortunately often not the case. If we allow variable declaration (and other things) outside of just one file (say dbt_project.yml), and it is an arbitrary number of files, then I can see Dev1 defining my_var_p in variable_file_1.yml and then Dev2 defining my_var_p in variable_file_2.yml. I suppose/hope that dbt would detect that and throw and error but there are still some clunky workflow issues in allowing variable declarations in multiple different .yml files.
Vars need to be parsed before other .yml files for declarations around models, tests, etc. are parsed. Right now this problem is handled by having one hard-coded .yml file (that is, dbt_project.yml) to be parsed before the other .yml files, but if we loosen this then dbt still needs a way to be able to determine how/what to parse in "pass 1" of parsing for vars (and I am sure a lot of other things that other, smarter people than I know already happen first :) ) versus "pass 2" of parsing for other things like tests, models, etc. And not just dbt -- this understanding of what .yml file gets parsed when needs to be not-too-hard to quickly understand for average devs. Otherwise people will be just littering random var declarations mixed in with tests and model config and then getting confused why things don't work.

One thought that comes to me - what if we had another separate set of subdirectories that were specifically dedicated to .yml files for config, like we already have a subdirectory declaration/space for snapshots, models, etc. Only things related to/extracted from dbt_project.yml could be put in there, and any other things related to models, snapshots, etc., would trigger a parsing error. This doesn't fix problem 1 above but it at least helps with problem 2. What say ye?

P.S. Also, I know var namespacing was removed for dbt_project.yml v2 config in .17 for some good reasons but it's also pretty hard not to have any way to do variable scoping in larger projects.

jtcohen6 · 2020-12-16T04:55:26Z

@codigo-ergo-sum I don't think I've ever seen you post from this alt account before. It goes without saying that I like the handle.

what if we had another separate set of subdirectories that were specifically dedicated to .yml files for config, like we already have a subdirectory declaration/space for snapshots, models, etc.

This is along the lines of what I was thinking: either an explicit set of subdirectories, or an explicit set of named files. I've been around just long enough to remember when packages was a special dict in dbt_project.yml rather than its own file; we split it out because we expected it to grow in size, and because it served a distinct purpose. We made the same choice for selectors.yml.

I'd be especially keen on a vars.yml: variables have a slightly different parsing context, we can be strict about accepting only literal values, and we could even do a better job of parsing vars.yml before parsing dbt_project.yml. That would make default values of vars called in dbt_project.yml work the way folks expect, rather than how it is today. I like that correspondence between vars.yml and CLI --vars, similar to how env vars can be sourced from an *.env file or prepended to a CLI command.

Configurations feel a bit trickier, because these can be especially verbose. How to coordinate hierarchies across multiple files without someone tripping over someone else? I'm honestly not sure. The cleanest separation I can envision would be allowing a project to have one each of models.yml, seeds.yml, etc.

P.S. Also, I know var namespacing was removed for dbt_project.yml v2 config in .17 for some good reasons but it's also pretty hard not to have any way to do variable scoping in larger projects.

This is fair. I wonder if the ability to scope vars differently for different model subsets may ultimately serve as a valid reason to split very big projects up into multiple sub-projects, installed as packages. That's regardless of whether they live in the same or separate repositories.

codigo-ergo-sum · 2021-01-05T01:35:11Z

Thanks for the compliment on the username @jtcohen6 :).

I think a vars.yml file would be a definite improvement over the current situation.

Would it be required or could vars also still be defined in dbt_project.yml? If required then that probably requires a new version 3 of the schema version for dbt_project.yml which is a sigificant change for existing users, right?

If not required, then what would the behavior look like if vars are defined in both places, and if they conflict? And are you suggesting that vars.yml would be parsed before dbt_project.yml is parsed? Allowing full, "no-gotcha" usage of vars in dbt_project.yml?

danielefrigo · 2021-06-15T15:58:47Z

I absolutely support the idea of parsing the vars before the dbt_project.yml.
This would enable leveraging vars in many additional ways, e.g. to enable or disable subfolders or defining schemas from vars, without loosing the ability to simply run a model using the vars default values.

moltar · 2021-10-25T05:39:05Z

Having outside vars available in dbt_project.yml would be a huge improvement.

We have a complex dbt_project.yml, with lots of repetition and using Jinja a lot.

For example:

source-paths:
  - modules/shared/models
  - modules/module1/models
  - modules/module2/models
  - stages/{{ env_var('DBT_STAGE', '@fake@') }}/models

Then enabling a particular stage via env var during deployment.

Would be great to set the stage once in the vars file, and then just use the var itself in the config. Also be able to define module names / prefixes, or even an entire array of modules to loop over.

krazavet-tinyclues · 2022-02-11T16:32:18Z

Hello 👋

In our company, we are using a lot DBT in a multi tenant context. For that purpose, we rely a lot on DBT variables with which we propagate the client configuration. Those configurations could be really different from a client to another.
Sometimes we have faced the following issue argument list too long: dbt, which is due to the large config payload (e.g. some of them could reach more than 600Kb).

We did not find a proper workaround for now. Passing a file path instead of a payload for our variables would probably solve our issue. This is why we are keen to know if there is any chance you are going to consider such feature for DBT ? (cc. @jtcohen6)

Thank you in advance 🙏

ybressler · 2022-04-21T16:47:52Z

++ this feature. The solution implemented by Jekyll (with _data directory) comes to mind as suitable.

ciprian-mandras · 2022-07-06T07:50:10Z

Hi all,

I agree that a vars.yml will be a good boost, but there you'll have just some global variables. Based on my background experience I think you should think as well to a solution for local variables. Some sort of accepting in a model configuration to define a model_vars.yml and use the variables for that specific model from there.

Thank you.

itechprasanth · 2022-08-29T21:16:29Z

I agree. Hope this will get implemented soon as it is always a good practice to modularize the configurations, rather than having everything in same single file.

vitorefazevedo · 2023-03-29T17:19:37Z

dbt still only allow global variables defined in dbt_projetc.yml?

codigo-ergo-sum · 2023-05-18T15:39:54Z

Just looking through issue backlogs and wanted to bump this... Would be great as we are working with projects that have tens or even hundreds of variables now. Also the lack of ability to namespace them is still challenging.

apolorei · 2023-06-02T14:27:28Z

I'm also facing this problem and would very much love some ideas of how to tackle it!

timvw · 2023-06-05T08:22:16Z

Currently we're trying to workaround this issue by using environment variables (and tooling via direnv and a .envrc file)

jeremyyeo · 2023-06-11T23:38:29Z

Sneaky workaround whilst wait for this to be built into core. Basically move var declarations into macro files:

https://gist.github.com/jeremyyeo/06d552ee8facc8100416655ebc25d9b9

krazavet-tinyclues · 2023-06-12T15:11:37Z

Sneaky workaround whilst wait for this to be built into core. Basically move var declarations into macro files:

https://gist.github.com/jeremyyeo/06d552ee8facc8100416655ebc25d9b9

This is exactly what we started to POC in our DBT stack. Using a dedicated macro file to load bigger JSON payload.
The idea is to generate a macro file containing all DBT variables. At the end it should look to something like that 👇

{% macro get_config() %}
  {{ return(fromjson("<JSON_CONTENT_HERE>")) }} 
{% endmacro %}

And then you can use it in your model:

{% set some_var = get_config().get(...) %}

That's a workaround that should make the job.

markproctor1 · 2023-08-07T14:13:31Z

Folder-level configs would make a huge difference on my project. With 50+ developers and growing we don't want anyone to modify project-level files day-to-day, but we do want them to manage many files and folders in their subject area.

Folder-level configs would do this. Clearly the need is there which is why they are featured in dbt_project.yml but this causes governance and git conflict problems where many teams trying to make changes to project-level files at the same time.

Basically, I need to treat our subject areas as mini-projects, each mini-project having its own configuration.

dbeatty10 · 2023-10-18T04:21:04Z

Within #8869, @slotrans described var() not being able see vars defined in dbt_project.yml for the purposes of configuring query-comment.

If this feature request were added, then it would solve that use-case.

github-actions · 2024-04-16T01:46:01Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

rlh1994 · 2024-04-16T07:22:15Z

epapineau · 2024-09-17T19:58:27Z

@jtcohen6 & @dbeatty10 - are y'all open to contributions on this one?

mroy-seedbox · 2024-09-25T05:57:13Z

@ciprian-mandras: Regarding this:

I agree that a vars.yml will be a good boost, but there you'll have just some global variables. Based on my background experience I think you should think as well to a solution for local variables. Some sort of accepting in a model configuration to define a model_vars.yml and use the variables for that specific model from there.

Couldn't you just use model configs for that? In our project, we put all sorts of stuff in the meta key all the time.

Like this:
some_model.yml

models:
- name: some_model
  description: Something something
  config:
    tags: [tag1, tag2]
  meta:
    key1:
      key_x: value
      key_y: value
      key_z: value
    key2:
      key_a: value
      key_b:
        key: value
    key3: value
    key4: value
    key5: value

mroy-seedbox · 2024-09-25T06:18:45Z

For everyone talking about namespacing variables, couldn't you just do it within a dict variable?

Like this:

vars:
  defaults:
    key1: value_a
    key2: value_b
    key3: value_c
  namespace1:
    key1: value1
    key2: value2
  namespace2:
    key1: value_x
    key2: value_y

And then you could retrieve those with an alternative macro (instead of using var('name')). Something like this:

{%- macro ns_var(namespace, key, default = None) -%}
  {%- if default == None and key not in var(namespace) and key not in var("defaults") -%}
    {{ exceptions.raise_compiler_error("Missing variable '" ~ key ~ "' in namespace " ~ namespace) }}
  {%- endif -%}
  {{- return(var(namespace).get(key, var("defaults").get(key, default))) -}}
{%- endmacro -%}

And call it like:

{{ ns_var('namespace1', 'key1') }} -- "value1"
{{ ns_var('namespace1', 'key3') }} -- "value_c"
{{ ns_var('namespace1', 'key4', 'default') }} -- "default"
{{ ns_var('namespace1', 'key4') }} -- Error

Or you could be fancy and do stuff like:

{{ ns_var('namespace1.key1') }} -- Although `default` remains a separate param, which is weird.
{{ ns1_var('key1') }} -- Hardcoded namespace inside of this macro.

mroy-seedbox · 2024-09-25T07:05:34Z

@codigo-ergo-sum: Regarding this:

I think a vars.yml file would be a definite improvement over the current situation.

Would it be required or could vars also still be defined in dbt_project.yml? If required then that probably requires a new version 3 of the schema version for dbt_project.yml which is a sigificant change for existing users, right?

If not required, then what would the behavior look like if vars are defined in both places, and if they conflict? And are you suggesting that vars.yml would be parsed before dbt_project.yml is parsed? Allowing full, "no-gotcha" usage of vars in dbt_project.yml?

I think multiple var files could even easily be supported. The only validation dbt would have to do is to make sure the same variable (i.e. top-level key/namespace) does not exist in more than one file (including dbt_project.yml, if any variables are still defined there). Otherwise, dbt should produce an error. That's it. The end.

Currently that is supported (although it's probably just how PyYAML loads the file), but it probably shouldn't be (and it's a reasonable breaking change as it's a very easy fix: just remove the duplicate):

vars:
  key: 123 # Just delete this one, it does nothing.
  key: 456 # Right now, this one "wins".

So we could end up with something like vars_abc.yml:

vars:
  abc:
    key: value
    ...

And vars_xyz.yml:

vars:
  xyz:
    key: value
    ...

But not vars_qrs.yml:

vars:
  abc: # Can't use `abc` again!
    key: value
    ...

And since those are specifically var files, we wouldn't even need that top-level vars: key.

mroy-seedbox · 2024-09-25T07:14:00Z

Folder-level configs would make a huge difference on my project. With 50+ developers and growing we don't want anyone to modify project-level files day-to-day, but we do want them to manage many files and folders in their subject area.

Folder-level configs would do this. Clearly the need is there which is why they are featured in dbt_project.yml but this causes governance and git conflict problems where many teams trying to make changes to project-level files at the same time.

Basically, I need to treat our subject areas as mini-projects, each mini-project having its own configuration.

@markproctor1: I think scattering variables/files everywhere is a terrible idea/bad practice. What do you think of the namespace idea suggested above instead?

And if dbt added support for multiple var files (all in one specific place), each of your subject areas/mini-projects could have its own file & namespace. No more merge conflicts! 🙌

EDIT: Although, thinking about it again now... scattering var files all over the place could be acceptable if dbt_project.yml had a config like this:

var-paths:
  - variables.yml
  - team1/variables.yml
  - team2/some_folder/variables.yml
  - ...

But then the variable files couldn't be loaded before dbt_project.yml, which sounded like a nice advantage to have. So probably still not a good idea to scatter var files around. 😅

Instead, we could just have a simple vars folder, which should be enough for 99.9% of dbt users:

vars/globals.yml
vars/team1.yml
vars/team2.yml

EDIT 2: UNLESS... dbt could also add a new paths.yml file, which would be loaded first. Then the var-paths would be loaded. And then dbt_project.yml and the rest of the stuff would be loaded.

But all that would be completely optional (like a power/advanced feature), and only enabled if paths.yml exists.

And for those who want to use variables inside of their paths for certain things (models paths, etc.), then just don't put those paths inside of paths.yml! 🙈 Keep them inside of dbt_project.yml, which should now have access to all the project variables.

mroy-seedbox · 2024-10-05T01:35:03Z

This feature is also highly related to #4873.

graciegoheen · 2024-12-12T18:38:08Z

Problems

It sounds like there are three problems being discussed here:

"I want to define project-wide variables outside of my dbt_project.yml."
"I want to be able to reference variables in my dbt_project.yml at parse-time."
"I want to have name-spaced variables."

Potential Acceptance Criteria for 1 & 2

I can define project-wide ("global variables") variables in a vars.yml file, separate from my dbt_project.yml

# vars.yml

vars:
  start_date: '2016-06-01'

dbt parses vars.yml before parsing dbt_project.yml, meaning default values of vars called in dbt_project.yml would work the way folks expect, rather than how it is today (equivalent to how this works when defining vars through the CLI a la --vars)

Option 1 - behavior change

You cannot define variables in both places, you must chose the "new" or "legacy" way
As such, this would need to be behind a behavior change flag
Open question: Given that behavior change flags are set in dbt_project.yml, would this even be possible?

Option 2 - backwards compatible

You can define variables in either places
Open question: Would we still get the parsing benefit if we went this route?

Current workaround: Have a really massive dbt_project.yml file with no ability to reference variables in your project configurations

Potential Acceptance Criteria for 3

You can define name-spaced ("local variables") variables (spec tbd) that would compile differently based on where it's being used
Open question: Is this folder-based? Model-based?

Current workaround: Split up your project into multiple sub-projects with their own variables - we already have project-based name-spacing

Next Steps

I think there's more clarity on what needs to be done for problems 1 & 2 - so we should go ahead and create an implementation issue for that set of acceptance criteria.

For problem 3, I think this needs to be baked more - but am open to someone starting up a Github discussion if so inspired!

codigo-ergo-sum · 2024-12-12T19:22:09Z

@graciegoheen great to see you post on a 3-year running discussion now :) (and just 3 years in this particular ticket, I think it's been ongoing in a notional sense since the beginning of dbt.)

Regarding #3, what's interesting is that, years ago, the ability to namespace variables in dbt_project.yml did exist and it was removed. I don't remember the reason why it was removed, however. @jtcohen6 was involved it in though. Jeremy any recollection on why it was removed? I'm just wondering if the reasoning behind that may still be germane to the discussion.

mroy-seedbox · 2024-12-14T01:26:24Z

@graciegoheen: Option 2 makes sense to me! 🙌

Treat variables in vars.yml the same as vars defined on the CLI.
Keep treating variables in dbt_project.yml the same.
~~Whatever is in vars.yml overrides what is in dbt_project.yml (just like CLI vars).~~
- Although people should really avoid defining the same variable in two places, so this should probably produce an error instead. Otherwise it could definitely cause confusion (whereas CLI vars are more easily understood as an override).

Actually, this is even better: we can actually combine both options!

If vars.yml exists, produce an error if there are any variables still defined in dbt_project.yml.
This would remain backward-compatible without the need for a change flag (i.e. the "flag" would be the presence of vars.yml or not).

Would we still get the parsing benefit if we went this route?

The parsing benefits could be enabled only if the vars are defined in vars.yml instead of dbt_project.yml (i.e. if the "flag" is on).
This would also encourage people to migrate their vars from dbt_project.yml over to vars.yml.

Side note: We shouldn't need the top-level vars: key in vars.yml, since it should be understood that the entire file is for variables only.

# vars.yml
start_date: '2016-06-01'

And for Problem #3 (namespaced variables), I would personally keep that out of scope. There are many existing workarounds already, as enumerated in this thread. But supporting multiple var files (i.e. a vars folder?) could definitely be useful in order to help keep things clean (and those who want to namespace them could use one file per namespace).

jtcohen6 added the enhancement New feature or request label Dec 15, 2020

jtcohen6 mentioned this issue Feb 16, 2021

target-based Jinja rendering of vars in dbt_project.yml #3105

Closed

5 tasks

jtcohen6 added the vars label Feb 22, 2021

jtcohen6 mentioned this issue May 29, 2021

Improve Application of DRY Principle to Source Freshness Definitions in .yml files in large dbt projects #3397

Closed

jtcohen6 mentioned this issue Aug 3, 2021

Variables in project.yml that use jinja templating are not properly rendering when referenced in packages #3658

Closed

5 tasks

jtcohen6 mentioned this issue Dec 22, 2021

[Bug] Source Freshness not running on renamed source tables #4525

Closed

1 task

jtcohen6 mentioned this issue Jan 26, 2022

[CT-76] [Feature] Allow vars to be declared in selectors #4599

Closed

1 task

jtcohen6 mentioned this issue Mar 23, 2022

The future of vars #4938

Closed

jtcohen6 added paper_cut A small change that impacts lots of users in their day-to-day Refinement Maintainer input needed labels Nov 28, 2022

github-actions bot added the triage label May 18, 2023

dbeatty10 removed the triage label May 21, 2023

dbeatty10 mentioned this issue Jun 1, 2023

[CT-2603] [Feature] CLI Should Support a Reading vars From YAML File #7709

Closed

3 tasks

dbeatty10 mentioned this issue Oct 18, 2023

[CT-3233] [Bug] var() cannot see vars defined in dbt_project.yml in query-comment context #8869

Closed

2 tasks

b-per mentioned this issue Feb 28, 2024

[Feature] Add ability to import/include YAML from other files #9695

Open

3 tasks

github-actions bot added the stale Issues that have gone stale label Apr 16, 2024

dbeatty10 removed the stale Issues that have gone stale label Apr 16, 2024

dbeatty10 mentioned this issue Aug 22, 2024

dbt_profile vars not parsing derived variable correctly when used in a yml file #10589

Closed

2 tasks

mroy-seedbox mentioned this issue Oct 5, 2024

[CT-369] [Feature] Use variables declared in dbt_project.yml file in the file #4873

Open

1 task

graciegoheen added the yaml label Dec 12, 2024

graciegoheen mentioned this issue Dec 12, 2024

[implementation] define project-wide vars in vars.yml, outside of dbt_project.yml #11144

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defining vars, folder-level configs outside dbt_project.yml #2955

Defining vars, folder-level configs outside dbt_project.yml #2955

jtcohen6 commented Dec 15, 2020

codigo-ergo-sum commented Dec 16, 2020 •

edited

Loading

jtcohen6 commented Dec 16, 2020

codigo-ergo-sum commented Jan 5, 2021

danielefrigo commented Jun 15, 2021

moltar commented Oct 25, 2021

krazavet-tinyclues commented Feb 11, 2022

ybressler commented Apr 21, 2022

ciprian-mandras commented Jul 6, 2022

itechprasanth commented Aug 29, 2022

vitorefazevedo commented Mar 29, 2023

codigo-ergo-sum commented May 18, 2023

apolorei commented Jun 2, 2023

timvw commented Jun 5, 2023

jeremyyeo commented Jun 11, 2023

krazavet-tinyclues commented Jun 12, 2023 •

edited

Loading

markproctor1 commented Aug 7, 2023 •

edited

Loading

dbeatty10 commented Oct 18, 2023

github-actions bot commented Apr 16, 2024

rlh1994 commented Apr 16, 2024

epapineau commented Sep 17, 2024

mroy-seedbox commented Sep 25, 2024

mroy-seedbox commented Sep 25, 2024 •

edited

Loading

mroy-seedbox commented Sep 25, 2024 •

edited

Loading

mroy-seedbox commented Sep 25, 2024 •

edited

Loading

mroy-seedbox commented Oct 5, 2024

graciegoheen commented Dec 12, 2024 •

edited

Loading

codigo-ergo-sum commented Dec 12, 2024

mroy-seedbox commented Dec 14, 2024 •

edited

Loading

Defining vars, folder-level configs outside dbt_project.yml #2955

Defining vars, folder-level configs outside dbt_project.yml #2955

Comments

jtcohen6 commented Dec 15, 2020

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

codigo-ergo-sum commented Dec 16, 2020 • edited Loading

jtcohen6 commented Dec 16, 2020

codigo-ergo-sum commented Jan 5, 2021

danielefrigo commented Jun 15, 2021

moltar commented Oct 25, 2021

krazavet-tinyclues commented Feb 11, 2022

ybressler commented Apr 21, 2022

ciprian-mandras commented Jul 6, 2022

itechprasanth commented Aug 29, 2022

vitorefazevedo commented Mar 29, 2023

codigo-ergo-sum commented May 18, 2023

apolorei commented Jun 2, 2023

timvw commented Jun 5, 2023

jeremyyeo commented Jun 11, 2023

krazavet-tinyclues commented Jun 12, 2023 • edited Loading

markproctor1 commented Aug 7, 2023 • edited Loading

dbeatty10 commented Oct 18, 2023

github-actions bot commented Apr 16, 2024

rlh1994 commented Apr 16, 2024

epapineau commented Sep 17, 2024

mroy-seedbox commented Sep 25, 2024

mroy-seedbox commented Sep 25, 2024 • edited Loading

mroy-seedbox commented Sep 25, 2024 • edited Loading

mroy-seedbox commented Sep 25, 2024 • edited Loading

mroy-seedbox commented Oct 5, 2024

graciegoheen commented Dec 12, 2024 • edited Loading

Problems

Potential Acceptance Criteria for 1 & 2

Potential Acceptance Criteria for 3

Next Steps

codigo-ergo-sum commented Dec 12, 2024

mroy-seedbox commented Dec 14, 2024 • edited Loading

codigo-ergo-sum commented Dec 16, 2020 •

edited

Loading

krazavet-tinyclues commented Jun 12, 2023 •

edited

Loading

markproctor1 commented Aug 7, 2023 •

edited

Loading

mroy-seedbox commented Sep 25, 2024 •

edited

Loading

mroy-seedbox commented Sep 25, 2024 •

edited

Loading

mroy-seedbox commented Sep 25, 2024 •

edited

Loading

graciegoheen commented Dec 12, 2024 •

edited

Loading

mroy-seedbox commented Dec 14, 2024 •

edited

Loading