Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-339] Persist dbt tags to Snowflake using object tagging feature #104

Closed
dweaver33 opened this issue Mar 9, 2022 · 18 comments
Closed
Labels
enhancement New feature or request

Comments

@dweaver33
Copy link

Describe the feature

When tags are defined for various objects (tables, columns, etc), allow the option to persist these tags down to Snowflake using Snowflake's object tagging feature.

Describe alternatives you've considered

I've considered writing a macro to do this, but development time would prefer something more built-in.

Additional context

Tags are used in Snowflake to apply security and masking. Having to maintain tags in two different areas is cumbersome and prone to error.

Who will this benefit?

Tags are useful in dbt for a number of reasons already known. Persisting them into Snowflake would allow users to utilize Snowflake's security methods and prevent having to define tags in two different areas.

Are you interested in contributing this feature?

I can certainly contribute expertise and technical assistance.

@dweaver33 dweaver33 added enhancement New feature or request triage labels Mar 9, 2022
@github-actions github-actions bot changed the title Persist dbt tags to Snowflake using object tagging feature [CT-339] Persist dbt tags to Snowflake using object tagging feature Mar 9, 2022
@VersusFacit
Copy link
Contributor

VersusFacit commented Apr 2, 2022

Hiya 👋 @dweaver33

Thanks for submitting an issue, especially with a link to the docs -- helps by making sure we're both talking about the same thing 👍

I totally appreciate the want to minimize the surface area where errors may be introduced. DRY code principles are espoused so often for a reason--a darn good reason.

However, my major concern with this feature isn't a demonstrated use case or lack of principle but rather that tags in dbt seem to be semantically different from tags in Snowflake.

As you can see in our docs, dbt tags are string values that can be assigned to objects in lists.

But, as seen in those docs you linked, Snowflake tags

can be assigned an arbitrary string value upon assigning the tag to a Snowflake object. Snowflake stores the tag and its string value as a key-value pair.

dbt tags do not support that key-value pair functionality. They are a different kind of metadata mechanism (one can argue whether snowflake's or dbt's are truly more or less useful but that's another design discussion...). Therefore, to tie your dbt tags to your Snowflake tags is to make a deliberate design choice, namely to restrict your own ability to take advantage of Snowflake's featureset. That's a perfectly reasonable thing to do if your team decides that's in its best interest. That said, having that built-in to dbt-snowflake would be to force/encourage other users into this paradigm (i.e. where they may use key-null tags in their Snowflake database only).

Seeing as we don't have evidence is a widespread requested feature, and it doesn't seem to support different database management approaches other enterprise users may have, I think the best solution would be a macro as opposed to a built-in feature, as you said!

@stkbailey
Copy link

@dweaver33 did you end up building a macro for this? Combined with tag-based masking policies, this would be a really great workflow

@dweaver33
Copy link
Author

@stkbailey , unfortunately I did not.

@jamesweakley
Copy link

jamesweakley commented Aug 2, 2022

I realise this is a closed issue, but I'm looking at this same thing and wanted to propose an alternative.

Firstly to broaden this a little, BigQuery has an equivalent key-value pair concept called "labels". I also wouldn't be surprised to see something like this appear in Databrick's upcoming Unity Catalog.

Redshift and Azure both have resource tagging, but they appear to relate to surrounding infrastructure rather than database objects so we can leave them out for now.

The subtle differences between BigQuery and Snowflake are:

  • BigQuery labels apply only at the table/view level only, not column
  • BigQuery labels are applied in isolation to each table, but Snowflake's tags are first-class, reusable objects that may reside in some other schema (example)
  • To add to the confusion, an empty-string label in BigQuery is called a tag

I want to propose that instead of leveraging dbt tags, we use a convention within the "meta" config element named "database_tags". Within that, you can define name-value tags at the table/view level and at the individual column level (Snowflake only).


{{ 
    config(
        materialized='view',
        meta={
            'database_tags':{
                'table': {
                    'accounting_row_string': 'visible'
                },
                'column':{
                    'ACCOUNT_NAME':{
                        'accounting_col_string': 'visible'
                    },
                    'ACCOUNT_NUMBER':{
                        'accounting_col_string': 'visible'
                    }
                }
            }
        }
    ) 
}}
select ACCOUNT_NAME, ACCOUNT_NUMBER from {{ref('accounts_raw')}}

I have a prototype of a macro which applies these tags during on-run-end, currently it's Snowflake-only but could be extended to BigQuery.

Planning to publish it within a new package, unless anyone can think of a better place (dbt-utils?). I don't think it belongs in core, given it's a convention that not everyone will agree with.

Happy to open a new issue with the different scope, just wanted to gauge people's reaction first.

@jamesweakley
Copy link

The other option is to use delimiters to shoehorn the key-value pairs into the tags, plus some indicator that it's a tag intended for the database.
e.g. "database:my_tag:my_value"

@dweaver33
Copy link
Author

@jamesweakley, I would be very interested in seeing this macro. We are a snowflake shop.

@jamesweakley
Copy link

jamesweakley commented Aug 2, 2022

This is an incredibly rough draft, still some bugs but it should work as a POC.
https://gist.github.com/jamesweakley/dff5df8fb6565e07d707d34eaaf63da7
save into the macros folder, then add this to dbt_project.yml:

on-run-end: "{{ apply_meta_as_tags(results) }}"

@Andylaugen3
Copy link

Another snowflake shop here, and was looking for a way to make tag-based masking policies applied at dbt-runtime.
Thank you for sharing @jamesweakley!

@mahsa-elemy
Copy link

@jamesweakley Thank you for sharing! I was wondering if this macro has published as a Package? The macro that you shared is it the final version? or since August there have been modifications to this?
We also use Snowflake and dbt in our company and would want to highly utilize what dbt offers to create tags for tables/columns in Snowflake for masking purposes.

@jamesweakley
Copy link

@mahsa-elemy what I linked is the extent of it at the moment, and I didn't end up putting it in a package yet.

@MartinGuindon, would you be happy to have this macro in dbt-snowflake-utils? I'm happy to open a PR if you agree

@gmatsonANSYS
Copy link

New to the chat here, was directed to this thread from our snowflake rep. @jamesweakley Appreciate the PoC and would love to see this as a macro in dbt-snowflake-utils. I did have to tweak the model_alias variable to include the schema.

From a use case - I have tested the tagging through model configs, but would love to see the column based meta tags happen on .yml files next to the documentation and testing lines. If that can already be done any help on the syntax would be great! Still learning advanced usage.

@MartinGuindon
Copy link

@MartinGuindon, would you be happy to have this macro in dbt-snowflake-utils? I'm happy to open a PR if you agree

Sure. That package is in dire need of love, this would be the opportunity for us to spend some time on it! :)

@jamesweakley
Copy link

@gmatsonANSYS

From a use case - I have tested the tagging through model configs, but would love to see the column based meta tags happen on .yml files next to the documentation and testing lines.

In the PR above, I switched it to use yml, check out the readme. I agree it looks nicer and better to manage, and allowed me to get rid of a level of nesting.

@oynek
Copy link

oynek commented Oct 27, 2022

I also think this is very interesting. Since, according to my information, tag-based masking has also been possible in Snowflake since yesterday (GA), I would find this extremely helpful for a holistic development workflow including the incorporation of data governance.

@DavidfNZ
Copy link

We are working on getting tag-based masking in Snowflake using dbt, so also very keen on hearing from anyone trying this out.

@jamesweakley
Copy link

For those looking for a quick summary:

@kanomaxb
Copy link

I don't think that this macro is a sufficient solution. It applies all tags in the on-run-end hook, which might cause 2-10 mins gaps (in some of our cases) between creating a populated table, and creating a tag. This is true for non-incremental only, but still is a deal-breaker for using the tag-based masking policies. We cannot have our data masked "some time" after inserting, it should be transactional.

@thu-IL
Copy link

thu-IL commented Apr 16, 2024

Hi folks. We've developed a dbt package for tag-based masking policy in Snowflake. Check it out here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests