Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Python Model use Databricks Serverless Compute #11103

Open
3 tasks done
PhilippLange opened this issue Dec 6, 2024 · 0 comments
Open
3 tasks done

[Feature] Python Model use Databricks Serverless Compute #11103

PhilippLange opened this issue Dec 6, 2024 · 0 comments
Labels
enhancement New feature or request triage

Comments

@PhilippLange
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

I couldn't find any documentation on using Databricks Serverless Compute when executing a DBT Python model.

As the Databricks Serverless Compute feature is pretty new, it isn't well documented within the Databricks Docs. However the Databricks Terraform Provider supports creating Serverless Job Tasks. Within the Terraform Provider Docs I could find the following note:
If no job_cluster_key, existing_cluster_id, or new_cluster were specified in task definition, then task will executed using serverless compute.

Using this as a starting point, I tried fiddling around with submission_method all_purpose_cluster and job_cluster. In addition I did not set the cluster_id / jub_cluster_config (or setting them to None / empty Strings).

However DBT prevents me to submit a job like that, as I receive the following Error Messages:

Databricks `http_path` or `cluster_id` of an all-purpose cluster is required for the `all_purpose_cluster` submission method.
 `job_cluster_config` is required for the `job_cluster` submission method.

Therefore I think some DBT enhancement is necessary to support Databricks Serverless Compute.

This feature would greatly benefit our job execution times and I'd appreciate someone looking into this.

Describe alternatives you've considered

I tried to fiddle with the existing configuration possibilities in order to get dbt to submit a serverless run, which did not work.

Who will this benefit?

Anyone using DBT Python models with a Databricks backend due to reduced startup times.

Are you interested in contributing this feature?

I'm willing to contribute, if someone guides me in the right direction.

Anything else?

No response

@PhilippLange PhilippLange added enhancement New feature or request triage labels Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
Development

No branches or pull requests

1 participant