Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-821] Allow passing through Cluster Configuration parameters via profiles.yml when using the thrift connection method #387

Closed
VShkaberda opened this issue Jul 7, 2022 · 5 comments · Fixed by #577
Labels
enhancement New feature or request

Comments

@VShkaberda
Copy link
Contributor

VShkaberda commented Jul 7, 2022

Description

Allow passing through Cluster Configuration parameters via profiles.yml when using the thrift connection method.
This allows configuration of values that would need to be set via SET to be run in the Spark session and take effect on DBT models. Analogous feature for odbc has been implemented in this request.

Alternatives

Alternative is to use pre_hook (taken example) but this method doesn't work quite well and has disadvantages.

{{ config(
    materialized='table',
    pre_hook=['SET spark.sql.shuffle.partitions=1200'],
    ...
)}}

This feature can be implemented through passing Configuration parameters as configuration parameter of hive.connect() of pyHive.

Benefits

  1. Parameters can be located in one place.
  2. It allows to set parameters which can't be set through the pre-hook.
  3. Sensitive parameters that don't suppose to be in code could be used (such as spark.hadoop.fs.s3a.access.key, spark.hadoop.fs.s3a.secret.key).

Our team may be contributing this feature. One issue is the naming in the profiles.yml: should it be server_side_parameters to make it matching used in odbc or configuration to make it matching used in hive.connect(). Or both name should be allowed?

@VShkaberda VShkaberda added enhancement New feature or request triage labels Jul 7, 2022
@github-actions github-actions bot changed the title Allow passing through Cluster Configuration parameters via profiles.yml when using the thrift connection method [CT-821] Allow passing through Cluster Configuration parameters via profiles.yml when using the thrift connection method Jul 7, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2023

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 5, 2023
@hanna-liashchuk
Copy link
Contributor

Hi,
I've prepared a PR for this #577

@Fleid
Copy link
Contributor

Fleid commented Mar 24, 2023

Hey @hanna-liashchuk, could you please check this discussion, I'm trying to regroup all the threads on that topic in one place.
Please let me know if the plan works for you, before we can decide to move forward with this specific issue.

@JCZuurmond
Copy link
Collaborator

Hi @VShkaberda , have you considered the SQL hints. For the repartition example in the issue I recommend the hints over setting the configuration as you might want to change this configuration parameter for different models.

@VShkaberda
Copy link
Contributor Author

@JCZuurmond I appreciate your hint. My case is more common, e.g. passing spark.hadoop.fs.s3a.access.key, spark.hadoop.fs.s3a.secret.key, spark.executor.memory etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants