Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-492] [Feature] Support partition_by and cluster_by on python models when supplied in model configurations #680

Closed
3 tasks done
Tracked by #7561
kalanyuz opened this issue Apr 25, 2023 · 5 comments · Fixed by #681
Closed
3 tasks done
Tracked by #7561
Labels
enhancement New feature or request partitioning Related to creating, replacing, or pruning partitions to avoid full table scans python_models

Comments

@kalanyuz
Copy link
Contributor

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

This issue was previous raised before in #247 but is now marked as stale and closed.

Currently, Python models do not support table creation with partition_by and cluster_by when supplied to the model configuration, despite the spark-bigquery-connector having support for it with indirect save mode.

I believe we can support for both configurations by modifying materializations/table.sql


to something like

df.write \
  .mode("overwrite") \
  .format("bigquery") \
  .option("writeMethod", "indirect").option("writeDisposition", 'WRITE_TRUNCATE') \
  {%- if partition_config.data_type | lower in ('date','timestamp','datetime') %}
  .option("partitionField", "{{- partition_config.field -}}") \
  .option("partitonType", "{{- partition_config.granularity -}}") \
  {%- endif %}
  {%- if raw_cluster_by is not none %}
  .option("clusteredFields", "{{- raw_cluster_by|join(',') -}}") \
  {%- endif %}
  .save("{{target_relation}}")
{% endmacro %}

Describe alternatives you've considered

No response

Who will this benefit?

All BigQuery users who are looking to create complex models using dataproc

Are you interested in contributing this feature?

yes

Anything else?

No response

@kalanyuz kalanyuz added enhancement New feature or request triage labels Apr 25, 2023
@github-actions github-actions bot changed the title [Feature] Allow partition_by and cluster_by on python models when supplied in model configurations [ADAP-492] [Feature] Allow partition_by and cluster_by on python models when supplied in model configurations Apr 25, 2023
@kalanyuz kalanyuz changed the title [ADAP-492] [Feature] Allow partition_by and cluster_by on python models when supplied in model configurations [ADAP-492] [Feature] Support partition_by and cluster_by on python models when supplied in model configurations Apr 25, 2023
@dbeatty10
Copy link
Contributor

This would be good to have @kalanyuz 🤩

Would you be interested in opening up a PR with your solution?

@kalanyuz
Copy link
Contributor Author

kalanyuz commented Apr 25, 2023 via email

@dbeatty10
Copy link
Contributor

Awesome @kalanyuz!

Here's an overview of the pull request process. After reading that, I'd suggest reading the entire contributing file top-to-bottom.

There's a lot of content in there, so please reach out if you get stuck and need any help!

@kalanyuz
Copy link
Contributor Author

@dbeatty10 I've submitted a PR here

@jtcohen6
Copy link
Contributor

(Much older issue for this: #247)

@dbeatty10 dbeatty10 added the partitioning Related to creating, replacing, or pruning partitions to avoid full table scans label May 16, 2023
mikealfare pushed a commit that referenced this issue Oct 11, 2023
#681)

* support partition_by and cluster_by on python models when supplied in model configurations
* add integration test for partitioned models
* fix typo on partitionType field
colin-rogers-dbt added a commit that referenced this issue Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request partitioning Related to creating, replacing, or pruning partitions to avoid full table scans python_models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants