diff --git a/.github/workflows/publish-docs.yml b/.github/workflows/publish-docs.yml new file mode 100644 index 0000000..bd37c70 --- /dev/null +++ b/.github/workflows/publish-docs.yml @@ -0,0 +1,34 @@ +name: Publish MkDocs on Main Branch + +on: + push: + branches: + - main + +jobs: + deploy: + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v3 + with: + fetch-depth: 0 + + - name: Set up Python + uses: actions/setup-python@v4 + with: + python-version: 3.8 + + - name: Install dependencies + run: | + python3 -m pip install --upgrade pip + pip install poetry + poetry install --with=dev + + - name: Deploy to GitHub Pages + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + run: | + git config --global user.name "GitHub Actions Bot" + git config --global user.email "github-actions[bot]@users.noreply.github.com" + poetry run mike deploy --push --force --message "Deployed by GitHub Actions" main \ No newline at end of file diff --git a/README.md b/README.md index 14bebea..b050a32 100644 --- a/README.md +++ b/README.md @@ -44,7 +44,7 @@ dbt-loom currently supports obtaining model definitions from: ## Getting Started -To being, install the `dbt-loom` python package. +To begin, install the `dbt-loom` python package. ```console pip install dbt-loom diff --git a/docs/assets/dbt-logo.svg b/docs/assets/dbt-logo.svg new file mode 100644 index 0000000..4f540b8 --- /dev/null +++ b/docs/assets/dbt-logo.svg @@ -0,0 +1,7 @@ + + + + + + + diff --git a/docs/getting-started.md b/docs/getting-started.md new file mode 100644 index 0000000..749975c --- /dev/null +++ b/docs/getting-started.md @@ -0,0 +1,137 @@ +# Getting Started + +To begin, install the `dbt-loom` python package. + +```console +pip install dbt-loom +``` + +Next, create a `dbt-loom` configuration file. This configuration file provides the paths for your +upstream project's manifest files. + +```yaml +manifests: + - name: project_name # This should match the project's real name + type: file + config: + # A path to your manifest. This can be either a local path, or a remote + # path accessible via http(s). + path: path/to/manifest.json +``` + +By default, `dbt-loom` will look for `dbt_loom.config.yml` in your working directory. You can also set the +`DBT_LOOM_CONFIG` environment variable. + +### Using dbt Cloud as an artifact source + +You can use dbt-loom to fetch model definitions from dbt Cloud by setting up a `dbt-cloud` manifest in your `dbt-loom` config, and setting the `DBT_CLOUD_API_TOKEN` environment variable in your execution environment. + +```yaml +manifests: + - name: project_name + type: dbt_cloud + config: + account_id: + + # Job ID pertains to the job that you'd like to fetch artifacts from. + job_id: + + api_endpoint: + # dbt Cloud has multiple regions with different URLs. Update this to + # your appropriate dbt cloud endpoint. + + step_id: + # If your job generates multiple artifacts, you can set the step from + # which to fetch artifacts. Defaults to the last step. +``` + +### Using an S3-compatible object store as an artifact source + +You can use dbt-loom to fetch manifest files from S3-compatible object stores +by setting up ab `s3` manifest in your `dbt-loom` config. Please note that this +approach supports all standard boto3-compatible environment variables and authentication mechanisms. Please see the [boto3 documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#environment-variables) for more details. + +```yaml +manifests: + - name: project_name + type: s3 + config: + bucket_name: + # The name of the bucket where your manifest is stored. + + object_name: + # The object name of your manifest file. +``` + +### Using GCS as an artifact source + +You can use dbt-loom to fetch manifest files from Google Cloud Storage by setting up a `gcs` manifest in your `dbt-loom` config. + +```yaml +manifests: + - name: project_name + type: gcs + config: + project_id: + # The alphanumeric ID of the GCP project that contains your target bucket. + + bucket_name: + # The name of the bucket where your manifest is stored. + + object_name: + # The object name of your manifest file. + + credentials: + # The OAuth2 Credentials to use. If not passed, falls back to the default inferred from the environment. +``` + +### Using Azure Storage as an artifact source + +You can use dbt-loom to fetch manifest files from Azure Storage +by setting up an `azure` manifest in your `dbt-loom` config. The `azure` type implements +the [DefaultAzureCredential](https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) +class, supporting all environment variables and authentication mechanisms. +Alternatively, set the `AZURE_STORAGE_CONNECTION_STRING` environment variable to +authenticate via a connection string. + +```yaml +manifests: + - name: project_name + type: azure + config: + account_name: # The name of your Azure Storage account + container_name: # The name of your Azure Storage container + object_name: # The object name of your manifest file. +``` + +### Using environment variables + +You can easily incorporate your own environment variables into the config file. This allows for dynamic configuration values that can change based on the environment. To specify an environment variable in the `dbt-loom` config file, use one of the following formats: + +`${ENV_VAR}` or `$ENV_VAR` + +#### Example: + +```yaml +manifests: + - name: revenue + type: gcs + config: + project_id: ${GCP_PROJECT} + bucket_name: ${GCP_BUCKET} + object_name: ${MANIFEST_PATH} +``` + +### Gzipped files + +`dbt-loom` natively supports decompressing gzipped manifest files. This is useful to reduce object storage size and to minimize loading times when reading manifests from object storage. Compressed file detection is triggered when the file path for the manifest is suffixed +with `.gz`. + +```yaml +manifests: + - name: revenue + type: s3 + config: + bucket_name: example_bucket_name + object_name: manifest.json.gz +``` diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..2d4047c --- /dev/null +++ b/docs/index.md @@ -0,0 +1,57 @@ +# dbt-loom + +`dbt-loom` is a dbt Core plugin that weaves together multi-project deployments. It works by fetching public model definitions from your dbt artifacts, and injecting those models into your dbt project. + +```mermaid +flowchart LR + + subgraph TOP[Your Infrastructure] + direction TB + dbt_runtime[dbt Core] + proprietary_plugin[Open Source Metadata Plugin] + + files[Local and Remote Files] + object_storage[Object Storage] + discovery_api[dbt Cloud APIs] + + discovery_api --> proprietary_plugin + files --> proprietary_plugin + object_storage --> proprietary_plugin + proprietary_plugin --> dbt_runtime + end + + Project --> TOP --> Warehouse +``` + + +dbt-loom currently supports obtaining model definitions from: + +- Local manifest files +- Remote manifest files via http(s) +- dbt Cloud +- GCS +- S3-compatible object storage services +- Azure Storage + +!!! warning + dbt core's plugin functionality is still in beta. Please note that this may break in the future as dbt Labs solidifies the dbt plugin API in future versions. + +## How does it work? + +As of dbt-core 1.6.0-b8, there now exists a `dbtPlugin` class which defines functions that can +be called by dbt-core's `PluginManger`. During different parts of the dbt-core lifecycle (such as graph linking and +manifest writing), the `PluginManger` will be called and all plugins registered with the appropriate hook will be executed. + +dbt-loom implements a `get_nodes` hook, and uses a configuration file to parse manifests, identify public models, and +inject those public models when called by `dbt-core`. + +## Known Caveats + +Cross-project dependencies are a relatively new development, and dbt-core plugins +are still in beta. As such there are a number of caveats to be aware of when using +this tool. + +1. dbt plugins are only supported in dbt-core version 1.6.0-b8 and newer. This means you must be using a dbt adapter + compatible with this version. +2. `PluginNodeArgs` are not fully-realized dbt `ManifestNode`s, so documentation generated by `dbt docs generate` may + be sparse when viewing injected models. diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100644 index 0000000..dbf6ba4 --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,65 @@ +site_name: dbt_loom + +theme: + + palette: + + # Palette toggle for light mode + - media: "(prefers-color-scheme: light)" + scheme: default + primary: custom + accent: custom + toggle: + icon: material/brightness-7 + name: Switch to dark mode + + # Palette toggle for dark mode + - media: "(prefers-color-scheme: dark)" + scheme: slate + primary: custom + accent: custom + toggle: + icon: material/brightness-4 + name: Switch to light mode + + # primary: black + logo: assets/dbt-logo.svg + name: material + features: + - navigation.footer + - navigation.instant + - navigation.tracking + - content.action.edit + - toc.integrate # check feedback + +extra: + version: + provider: mike + +markdown_extensions: + - attr_list # needed to allow providing width + - md_in_html # to allow Markdown in details + - toc: + toc_depth: 3 + permalink: "#" + - pymdownx.highlight: + anchor_linenums: true + line_spans: __span + pygments_lang_class: true + - pymdownx.inlinehilite + - pymdownx.snippets + - pymdownx.superfences: + custom_fences: + - name: mermaid + class: mermaid + format: !!python/name:pymdownx.superfences.fence_code_format + - pymdownx.details # allow collapsible blocks + - admonition + +repo_url: https://github.com/nicholasyager/dbt-loom +repo_name: nicholasyager/dbt-loom +edit_uri: edit/main/docs/ + +nav: + - Home: index.md + - Getting started: getting-started.md \ No newline at end of file