Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Create new configuration abstraction for a Project #61

Open
nicholasyager opened this issue May 13, 2024 · 5 comments
Open

[Feature] Create new configuration abstraction for a Project #61

nicholasyager opened this issue May 13, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@nicholasyager
Copy link
Owner

nicholasyager commented May 13, 2024

Is your feature request related to a problem? Please describe.
Currently, dbt-loom does not have a good way to leverage a dbt project's restrict-access configuration. This config is important for limiting cross-project access to protected and private nodes. Since this is an entirely new file to load, we do not currently have a defined path for loading this information.

flowchart LR
  
  subgraph project_a
    dbt_project.yml
    manifest.json
  end

  subgraph project_b
    project_b_project[dbt_project.yml]
    project_b_manifest[manifest.json]
    dbt_loom.config.yml

    subgraph dbt-loom

      ManifestLoader
    
      Plugin 

    end
  end

  
  manifest.json --> ManifestLoader

  
  ManifestLoader --> Plugin

  dbt_loom.config.yml --> Plugin


  dbt-core

  Plugin --> dbt-core
  project_b_project --> dbt-core
  project_b_manifest --> dbt-core

Loading

Describe the solution you'd like
I'd like to be able to define a Project, and with this project a location for its dbt_project.yml file and an associated manifest.json file. This should support all of our existing artifact sources where possible.

flowchart LR
  
  subgraph project_a
    dbt_project.yml
    manifest.json
  end

  subgraph project_b
    project_b_project[dbt_project.yml]
    project_b_manifest[manifest.json]
    dbt_loom.config.yml

    subgraph dbt-loom

      ManifestLoader
      ProjectLoader
      Plugin 

    end
  end

  dbt_project.yml --> ProjectLoader
  manifest.json --> ManifestLoader

  ProjectLoader --> Plugin
  ManifestLoader --> Plugin

  dbt_loom.config.yml --> Plugin


  dbt-core

  Plugin --> dbt-core
  project_b_project --> dbt-core
  project_b_manifest --> dbt-core

Loading

To configure this, we can introduce a new optional top-level concept of a Project.

dependencies:

  - name: core
    description: All common core dependencies across our `n` base projects.
    artifacts:
      - type: s3
        config:
          bucket_name: com.example.dbt_artifacts 
          object_prefix: latest/
        
   - name: revenue
     description: A proof-of-concept local-only revenue reporting project.
     artifacts:
        - type: file
          config:
            path: path/to/manifest.json
        - type: file
          config:
            path: path/to/dbt_project.yml       
   

Describe alternatives you've considered

  • Have restrict-access configured in the dbt_loom.config.yml file instead.
  • Don't support restruct-access at all.

Additional context

@nicholasyager nicholasyager added the enhancement New feature or request label May 13, 2024
@nicholasyager nicholasyager self-assigned this May 13, 2024
@z3z1ma
Copy link

z3z1ma commented May 13, 2024

Isn't it dbt's job to enforce this behavior? Why should a plugin author need to do this? Provided you set this in your dbt_project yml, my expectation would be that the manifest json stores this information and it is respected in dbt core while managing refs. Otherwise it's N times the repeated work for every plugin author to respect the restrict-access setting -- since the only surface area of plugins is either adding nodes or dumping artifacts AFAIK, it should be baked in. Which BTW [the extremely shallow plugin API implementation] sort of points to dbt-core teams narrow focus on enabling cloud/enterprise features and not innovating on their OSS software from a resourcing/engineering standpoint.

@nicholasyager
Copy link
Owner Author

Some thoughts here, @z3z1ma.

The mechanism for respecting/enforcing the restrict-access parameter of a project is baked into dbt-core. Rather, the data flow for this information in injected in at invocation. Based on what I can tell from the dbt-core source (eh 🤷🏻), it looks like dbt Cloud may be injecting the dbt_project.yml from upstream projects into downstream projects during invocation, thereby allowing dbt-core to run the check. In our case, we are coming into the party late and need to artificially inject this dbt_project.yml via patched wrappers.

So, we have a common need for downstream projects to KNOW this configuration value for upstream projects. This means that we either load the upstream project's dbt_project.yml file (which is what I suspect dbt Cloud is doing, since it already has all the code), or we have devs manually configure this value, which isn't ideal as we've both pointed out above.

I recommend re-reading the proposal again and dig deep into how this solution might work!

@z3z1ma
Copy link

z3z1ma commented May 13, 2024

Thanks @nicholasyager!

I totally get how what your saying will work. I'm saying that PluginNodes in dbt-core is missing critical info (such as an optional dict of parent project info). And no matter how I dice it up, having to traverse an MRO across like 10 different files in dbt-core starting from the is_invalid_*_ref functions just to figure out where runtime_config is originally generated (which is where the dependencies mapping lives that is currently mutated via monkey patch) is not great. So the people forced to tackle the complexity are plugin authors, including dbt's own private plugin which is services the capabilities in cloud (I would imagine?)... So when I say it should be in dbt-core. I mean, more broadly, it shouldn't require monkey patching to get injected nodes to respect their parent/source project's configuration. Or to respect private/protected as a concept from 'plugin nodes' which is literally the primary concept introduced by dbt core coupled to the existence/creation of this plugin interface. So my issue is exactly that PluginNodes and project dependencies are separate, cuz well... nodes either come from projects or dont. And they only handled 1 of what are only 2 cases in this high level state machine.

So TLDR

Speculating the solution is simple enough
LoomRunnableConfig needs to be replaced with a native (or duck typed) dbt Project object. We can use a global/nonlocal dict of project_name -> Project

The implications however are that we are exposing a (in my opinion a hard to miss) gap in the upstream codebase that surprises me was not thought through in this way.

@nicholasyager
Copy link
Owner Author

nicholasyager commented May 13, 2024

@z3z1ma

Speculating the solution is simple enough
LoomRunnableConfig needs to be replaced with a native (or duck typed) dbt Project object. We can use a global/nonlocal dict of project_name -> Project

Cool. This is aligned with my thinking. I'll ponder some reasonable stepping stones that gets loom's configuration working in this direction in a backwards compatible manner.

Regarding the complexities of managing plugins using the interfaces provided, I totally understand -- you're commenting on an issue thread for a dbt-core plugin 😉🤣 Let's be charitable to the devs working on dbt-core, and operate with an understanding that they are working under similar resource, timing, and commercial pressures as the people who work for dbt Labs' competitors who happen to leverage dbt-loom. We're very much operating on the fringes of what dbt Labs intended folks to do, and that is what makes this all the more interesting an exercise.

@z3z1ma
Copy link

z3z1ma commented May 13, 2024

Agreed. Don't mean to come off the wrong way. I am as much a contributor to OSS as I am a critic to software. They go hand in hand sometimes and have nothing to do with any company as much as it does a collection of python files which comprise the software. I would definitely PR these opinions upstream if I was still using dbt at work lol
Appreciate you @nicholasyager and all you've done here. Dbt folks too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants