Skip to content

Commit

Permalink
[Feature] Add Databricks support (#14)
Browse files Browse the repository at this point in the history
Adds initial support for Databricks. Baked in an expectation that the
user must be using an environment variable with an `env_var` call in
their dbt profile, as opposed to a hard coded value. Will probably make
this smarter at some point, but for now, I feel like a forcing function
to push people to not store hard coded credentials is a good thing!

The same expectation is echoed when writing a manually configured run to
a `profiles.yml` file, the form asks you for an environment variable,
_not_ a value, and it outputs as an environment variable with the name
given being accessed through `env_var`.
  • Loading branch information
gwenwindflower authored Apr 23, 2024
2 parents b267c00 + de70f19 commit 9112c63
Show file tree
Hide file tree
Showing 14 changed files with 336 additions and 22 deletions.
30 changes: 28 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ If you're new to dbt, [check out the wiki](https://github.com/gwenwindflower/tbd
- [x] BigQuery
- [x] Snowflake
- [ ] Redshift
- [ ] Databricks
- [x] Databricks
- [x] Postgres
- [x] DuckDB

Expand Down Expand Up @@ -82,6 +82,32 @@ Using local DuckDB doesn't require authentication, just an existing DuckDB datab
> [!NOTE]
> I've built-in support for [MotherDuck](https://motherduck.com/), you just need to set an env var called `MOTHERDUCK_TOKEN` with your service token, then pass the path as `md:`, **but** until MotherDuck upgrades to v10 this requires you to use DuckDB 0.9.2 locally for compatibility. MotherDuck says the upgrade will happen any day now so hopefully this note will be removed soon!
### 🐘 Postgres

Postgres setup is fairly normal! It just needs a host, port, user, password, and of course a database and schema that you want to model. The default values point to a localhost instance on port 5432, but you can change that to whatever you need and it should work with cloud-hosted Postgres instances as well (though I haven't yet tested this myself). Tests have been running on Postgres 16, but `tbd` makes very standard queries to `INFORMATION_SCHEMA` and doesn't do anything fancy with the connection, so it should work with any relatively modern Postgres version probably.

### 🧱 Databricks

Databricks for now only works with Personal Access Token authentication. You can generate one in your Databricks account settings under the Developer tab. Put this in an environment variable called something like `DATABRICKS_TOKEN` (the default option). `tbd` will _not_ let you pass the value directly.

You'll need the information found in your SQL Warehouse's 'Connection details' tab which has the Hostname and Http Path you'll need, unless you already have it in a dbt profile, then you can choose that profile, pick which catalog and schema you want to target, and you're good to go!

If you're using a dbt profile, it will need to be configured for PAT authentication and referencing the token via an environment variable. The profile should look something like this:

```yaml
jaffle_shop_databricks:
target: dev
outputs:
dev:
type: databricks
catalog: jaffle_shop
schema: dbt_winnie
host: 1234.cloud.databricks.com
http_path: /sql/1.0/warehouses/56789
token: "{{ env_var('DATABRICKS_TOKEN') }}"
threads: 16
```
## Usage
The tool has a lovely TUI interface that will walk you through the necessary steps. You can run it with the following command:
Expand Down Expand Up @@ -125,7 +151,7 @@ I'm going to experiment very soon with using structured output conformed to dbt'
- [x] Add support for Snowflake
- [x] Add support for BigQuery
- [ ] Add support for Redshift
- [ ] Add support for Databricks
- [x] Add support for Databricks
- [x] Add support for Postgres
- [x] Add support for DuckDB
- [x] Add support for MotherDuck
Expand Down
1 change: 1 addition & 0 deletions fetch_dbt_profiles.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ type DbtProfile struct {
Priority string `yaml:"priority"`
Keyfile string `yaml:"keyfile"`
ImpersonateServiceAccount string `yaml:"impersonate_service_account"`
HttpPath string `yaml:"http_path"`
Extensions []string `yaml:"extensions"`
Scopes []string `yaml:"scopes"`
JobCreationTimeoutSeconds int `yaml:"job_creation_timeout_seconds"`
Expand Down
42 changes: 40 additions & 2 deletions forms.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ type FormResponse struct {
DbtProfileName string
Path string
Port string
TokenEnvVar string
HttpPath string
Username string
Prefix string
Llm string
Expand Down Expand Up @@ -61,8 +63,8 @@ func Forms(ps DbtProfiles) (FormResponse, error) {
BuildDir: "build",
LlmKeyEnvVar: "OPENAI_API_KEY",
Prefix: "stg",
Host: "localhost",
Port: "5432",
TokenEnvVar: "DATABRICKS_TOKEN",
}
pinkUnderline := color.New(color.FgMagenta).Add(color.Bold, color.Underline).SprintFunc()
greenBold := color.New(color.FgGreen).Add(color.Bold).SprintFunc()
Expand Down Expand Up @@ -132,7 +134,7 @@ https://github.com/gwenwindflower/tbd
Placeholder("raw").
Validate(notEmpty),
huh.NewInput().
Title("What *database* is that schema in?").
Title("What *database/project/catalog* is that schema in?").
Value(&dfr.Database).
Placeholder("jaffle_shop").
Validate(notEmpty),
Expand All @@ -148,6 +150,7 @@ https://github.com/gwenwindflower/tbd
huh.NewOption("BigQuery", "bigquery"),
huh.NewOption("DuckDB", "duckdb"),
huh.NewOption("Postgres", "postgres"),
huh.NewOption("Databricks", "databricks"),
).
Value(&dfr.Warehouse),
).WithHideFunc(func() bool {
Expand Down Expand Up @@ -265,6 +268,40 @@ Relative to pwd e.g. if db is in this dir -> cool_ducks.db`).
return dfr.Warehouse != "postgres"
}),

huh.NewGroup(
huh.NewInput().
Title("What is your Databricks *host*?").
Value(&dfr.Host).
Placeholder("dbc-12345.cloud.databricks.com").
Validate(notEmpty),
huh.NewInput().
Title("What is your warehouse's *HTTP path*?").
Value(&dfr.HttpPath).
Placeholder("/sql/1.0/warehouses/12345").
Validate(notEmpty),
huh.NewInput().
Title("What is your Databricks *username*?").
Value(&dfr.Username).
Placeholder("arwen").
Validate(notEmpty),
huh.NewInput().
Title("What env var holds your Databricks *Personal Access Token*?").
Value(&dfr.TokenEnvVar).
Validate(notEmpty),
huh.NewInput().
Title("What is the *catalog* you want to generate?").
Value(&dfr.Database).
Placeholder("rivendell").
Validate(notEmpty),
huh.NewInput().
Title("What is the *schema* you want to generate?").
Value(&dfr.Schema).
Placeholder("evenstar").
Validate(notEmpty),
).WithHideFunc(func() bool {
return dfr.Warehouse != "databricks"
}),

huh.NewGroup(
huh.NewNote().
Title(fmt.Sprintf("🤖 %s LLM generation 🦙✨", yellowItalic("Optional"))).
Expand All @@ -280,6 +317,7 @@ _Requires an_ %s _stored in an env var_.`, pinkUnderline("descriptions"), pinkUn

huh.NewGroup(
huh.NewSelect[string]().
Title("Choose your LLM provider:").
Options(
huh.NewOption("OpenAI", "openai"),
huh.NewOption("Groq", "groq"),
Expand Down
18 changes: 18 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ require (
cloud.google.com/go/bigquery v1.60.0
github.com/DATA-DOG/go-sqlmock v1.5.2
github.com/charmbracelet/huh v0.3.1-0.20240306161957-71f31c155b08
github.com/databricks/databricks-sql-go v1.5.4
github.com/fatih/color v1.16.0
github.com/jarcoal/httpmock v1.3.1
github.com/lib/pq v1.10.9
Expand All @@ -27,8 +28,11 @@ require (
github.com/Azure/azure-sdk-for-go/sdk/internal v1.1.2 // indirect
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.0.0 // indirect
github.com/JohnCGriffin/overflow v0.0.0-20211019200055-46fa312c352c // indirect
github.com/andybalholm/brotli v1.0.5 // indirect
github.com/apache/arrow/go/v12 v12.0.1 // indirect
github.com/apache/arrow/go/v14 v14.0.2 // indirect
github.com/apache/arrow/go/v15 v15.0.0 // indirect
github.com/apache/thrift v0.17.0 // indirect
github.com/atotto/clipboard v0.1.4 // indirect
github.com/aws/aws-sdk-go-v2 v1.17.7 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.4.10 // indirect
Expand All @@ -50,32 +54,43 @@ require (
github.com/charmbracelet/lipgloss v0.9.1 // indirect
github.com/charmbracelet/x/exp/strings v0.0.0-20240304160204-3835fda67169 // indirect
github.com/containerd/console v1.0.4-0.20230313162750-1ae8d489ac81 // indirect
github.com/coreos/go-oidc/v3 v3.5.0 // indirect
github.com/danieljoos/wincred v1.1.2 // indirect
github.com/dnephin/pflag v1.0.7 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/dvsekhvalnov/jose2go v1.6.0 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/form3tech-oss/jwt-go v3.2.5+incompatible // indirect
github.com/fsnotify/fsnotify v1.5.4 // indirect
github.com/gabriel-vasile/mimetype v1.4.2 // indirect
github.com/go-jose/go-jose/v3 v3.0.0 // indirect
github.com/go-logr/logr v1.4.1 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/goccy/go-json v0.10.2 // indirect
github.com/godbus/dbus v0.0.0-20190726142602-4481cbc300e2 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/protobuf v1.5.4 // indirect
github.com/golang/snappy v0.0.4 // indirect
github.com/google/flatbuffers v23.5.26+incompatible // indirect
github.com/google/s2a-go v0.1.7 // indirect
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.2 // indirect
github.com/googleapis/gax-go/v2 v2.12.3 // indirect
github.com/gsterjov/go-libsecret v0.0.0-20161001094733-a6f4afe4910c // indirect
github.com/hashicorp/go-cleanhttp v0.5.1 // indirect
github.com/hashicorp/go-retryablehttp v0.7.1 // indirect
github.com/jmespath/go-jmespath v0.4.0 // indirect
github.com/klauspost/asmfmt v1.3.2 // indirect
github.com/klauspost/compress v1.17.7 // indirect
github.com/klauspost/cpuid/v2 v2.2.7 // indirect
github.com/lucasb-eyer/go-colorful v1.2.0 // indirect
github.com/mattn/go-colorable v0.1.13 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/mattn/go-localereader v0.0.1 // indirect
github.com/mattn/go-runewidth v0.0.15 // indirect
github.com/minio/asm2plan9s v0.0.0-20200509001527-cdd76441f9d8 // indirect
github.com/minio/c2goasm v0.0.0-20190812172519-36a3d3bbc4f3 // indirect
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db // indirect
github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/mtibben/percent v0.2.1 // indirect
Expand All @@ -85,8 +100,10 @@ require (
github.com/muesli/termenv v0.15.2 // indirect
github.com/pierrec/lz4/v4 v4.1.21 // indirect
github.com/pkg/browser v0.0.0-20210911075715-681adbf594b8 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/rogpeppe/go-internal v1.11.0 // indirect
github.com/rs/zerolog v1.28.0 // indirect
github.com/sirupsen/logrus v1.9.0 // indirect
github.com/zeebo/xxh3 v1.0.2 // indirect
go.opencensus.io v0.24.0 // indirect
Expand All @@ -113,4 +130,5 @@ require (
google.golang.org/genproto/googleapis/rpc v0.0.0-20240314234333-6e1732d8331c // indirect
google.golang.org/grpc v1.62.1 // indirect
google.golang.org/protobuf v1.33.0 // indirect
gotest.tools/gotestsum v1.8.2 // indirect
)
Loading

0 comments on commit 9112c63

Please sign in to comment.