-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defining vars, folder-level configs outside dbt_project.yml #2955
Comments
Some additional thoughts/failure modes/concerns to think about with this:
One thought that comes to me - what if we had another separate set of subdirectories that were specifically dedicated to .yml files for config, like we already have a subdirectory declaration/space for snapshots, models, etc. Only things related to/extracted from dbt_project.yml could be put in there, and any other things related to models, snapshots, etc., would trigger a parsing error. This doesn't fix problem 1 above but it at least helps with problem 2. What say ye? P.S. Also, I know var namespacing was removed for dbt_project.yml v2 config in .17 for some good reasons but it's also pretty hard not to have any way to do variable scoping in larger projects. |
@codigo-ergo-sum I don't think I've ever seen you post from this alt account before. It goes without saying that I like the handle.
This is along the lines of what I was thinking: either an explicit set of subdirectories, or an explicit set of named files. I've been around just long enough to remember when I'd be especially keen on a Configurations feel a bit trickier, because these can be especially verbose. How to coordinate hierarchies across multiple files without someone tripping over someone else? I'm honestly not sure. The cleanest separation I can envision would be allowing a project to have one each of
This is fair. I wonder if the ability to scope |
Thanks for the compliment on the username @jtcohen6 :). I think a Would it be required or could vars also still be defined in If not required, then what would the behavior look like if vars are defined in both places, and if they conflict? And are you suggesting that |
I absolutely support the idea of parsing the vars before the dbt_project.yml. |
Having outside vars available in We have a complex For example: source-paths:
- modules/shared/models
- modules/module1/models
- modules/module2/models
- stages/{{ env_var('DBT_STAGE', '@fake@') }}/models Then enabling a particular stage via env var during deployment. Would be great to set the stage once in the vars file, and then just use the var itself in the config. Also be able to define module names / prefixes, or even an entire array of modules to loop over. |
Hello 👋 In our company, we are using a lot DBT in a multi tenant context. For that purpose, we rely a lot on DBT variables with which we propagate the client configuration. Those configurations could be really different from a client to another. We did not find a proper workaround for now. Passing a file path instead of a payload for our variables would probably solve our issue. This is why we are keen to know if there is any chance you are going to consider such feature for DBT ? (cc. @jtcohen6) Thank you in advance 🙏 |
++ this feature. The solution implemented by Jekyll (with _data directory) comes to mind as suitable. |
Hi all, I agree that a vars.yml will be a good boost, but there you'll have just some global variables. Based on my background experience I think you should think as well to a solution for local variables. Some sort of accepting in a model configuration to define a model_vars.yml and use the variables for that specific model from there. Thank you. |
I agree. Hope this will get implemented soon as it is always a good practice to modularize the configurations, rather than having everything in same single file. |
dbt still only allow global variables defined in dbt_projetc.yml? |
Just looking through issue backlogs and wanted to bump this... Would be great as we are working with projects that have tens or even hundreds of variables now. Also the lack of ability to namespace them is still challenging. |
I'm also facing this problem and would very much love some ideas of how to tackle it! |
Currently we're trying to workaround this issue by using environment variables (and tooling via direnv and a .envrc file) |
Sneaky workaround whilst wait for this to be built into core. Basically move var declarations into macro files: https://gist.github.com/jeremyyeo/06d552ee8facc8100416655ebc25d9b9 |
This is exactly what we started to POC in our DBT stack. Using a dedicated macro file to load bigger JSON payload.
And then you can use it in your model:
That's a workaround that should make the job. |
Folder-level configs would make a huge difference on my project. With 50+ developers and growing we don't want anyone to modify project-level files day-to-day, but we do want them to manage many files and folders in their subject area. Folder-level configs would do this. Clearly the need is there which is why they are featured in dbt_project.yml but this causes governance and git conflict problems where many teams trying to make changes to project-level files at the same time. Basically, I need to treat our subject areas as mini-projects, each mini-project having its own configuration. |
Within #8869, @slotrans described If this feature request were added, then it would solve that use-case. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
@jtcohen6 & @dbeatty10 - are y'all open to contributions on this one? |
@ciprian-mandras: Regarding this:
Couldn't you just use model configs for that? In our project, we put all sorts of stuff in the Like this: models:
- name: some_model
description: Something something
config:
tags: [tag1, tag2]
meta:
key1:
key_x: value
key_y: value
key_z: value
key2:
key_a: value
key_b:
key: value
key3: value
key4: value
key5: value |
For everyone talking about namespacing variables, couldn't you just do it within a dict variable? Like this: vars:
defaults:
key1: value_a
key2: value_b
key3: value_c
namespace1:
key1: value1
key2: value2
namespace2:
key1: value_x
key2: value_y And then you could retrieve those with an alternative macro (instead of using
And call it like:
Or you could be fancy and do stuff like:
|
@codigo-ergo-sum: Regarding this:
I think multiple var files could even easily be supported. The only validation dbt would have to do is to make sure the same variable (i.e. top-level key/namespace) does not exist in more than one file (including Currently that is supported (although it's probably just how PyYAML loads the file), but it probably shouldn't be (and it's a reasonable breaking change as it's a very easy fix: just remove the duplicate): vars:
key: 123 # Just delete this one, it does nothing.
key: 456 # Right now, this one "wins". So we could end up with something like
And
But not
And since those are specifically var files, we wouldn't even need that top-level |
@markproctor1: I think scattering variables/files everywhere is a terrible idea/bad practice. What do you think of the namespace idea suggested above instead? And if dbt added support for multiple var files (all in one specific place), each of your subject areas/mini-projects could have its own file & namespace. No more merge conflicts! 🙌 EDIT: Although, thinking about it again now... scattering var files all over the place could be acceptable if var-paths:
- variables.yml
- team1/variables.yml
- team2/some_folder/variables.yml
- ... But then the variable files couldn't be loaded before Instead, we could just have a simple vars/globals.yml
vars/team1.yml
vars/team2.yml EDIT 2: UNLESS... dbt could also add a new But all that would be completely optional (like a power/advanced feature), and only enabled if And for those who want to use variables inside of their paths for certain things (models paths, etc.), then just don't put those paths inside of |
This feature is also highly related to #4873. |
ProblemsIt sounds like there are three problems being discussed here:
Potential Acceptance Criteria for 1 & 2
# vars.yml
vars:
start_date: '2016-06-01'
Option 1 - behavior change
Option 2 - backwards compatible
Current workaround: Have a really massive Potential Acceptance Criteria for 3
Current workaround: Split up your project into multiple sub-projects with their own variables - we already have project-based name-spacing Next StepsI think there's more clarity on what needs to be done for problems 1 & 2 - so we should go ahead and create an implementation issue for that set of acceptance criteria. For problem 3, I think this needs to be baked more - but am open to someone starting up a Github discussion if so inspired! |
@graciegoheen great to see you post on a 3-year running discussion now :) (and just 3 years in this particular ticket, I think it's been ongoing in a notional sense since the beginning of dbt.) Regarding #3, what's interesting is that, years ago, the ability to namespace variables in dbt_project.yml did exist and it was removed. I don't remember the reason why it was removed, however. @jtcohen6 was involved it in though. Jeremy any recollection on why it was removed? I'm just wondering if the reasoning behind that may still be germane to the discussion. |
@graciegoheen: Option 2 makes sense to me! 🙌
Actually, this is even better: we can actually combine both options!
Side note: We shouldn't need the top-level # vars.yml
start_date: '2016-06-01' And for Problem #3 (namespaced variables), I would personally keep that out of scope. There are many existing workarounds already, as enumerated in this thread. But supporting multiple var files (i.e. a |
Describe the feature
From @benjaminsingleton:
Describe alternatives you've considered
dbt_project.yml
gets really really big??Additional context
Who will this benefit?
The text was updated successfully, but these errors were encountered: