Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove HiveConfig binding from Delta and Iceberg with Glue #19735

Merged
merged 4 commits into from
Nov 15, 2023

Conversation

findepi
Copy link
Member

@findepi findepi commented Nov 14, 2023

Delta Lake and Iceberg connectors do not use configuration from
HiveConfig and so the HiveConfig-provided configuration should not
be accepted when configuring the connector.

Before the change `MetastoreHiveStatisticsProvider` was both abstract
and concrete class in one: one constructor allowed to plug
implementation of the most important method, while the other provided
the implementation in a form of a lambda. This made harder to follow the
code flow (e.g. finding call sites). The commit splits the class into
abstract class (shared logic) and a concrete implementation class.
@cla-bot cla-bot bot added the cla-signed label Nov 14, 2023
@github-actions github-actions bot added tests:hive iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector labels Nov 14, 2023
@@ -135,6 +137,8 @@ public static Connector createConnector(
Set<ConnectorTableFunction> connectorTableFunctions = injector.getInstance(Key.get(new TypeLiteral<Set<ConnectorTableFunction>>() {}));
FunctionProvider functionProvider = injector.getInstance(FunctionProvider.class);

checkState(!injector.getBindings().containsKey(Key.get(HiveConfig.class)), "HiveConfig should not be bound");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this look over-specific - after all we are not checking other class which we do not expect to be bound if those are not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HiveConfig is just the most likely offender (based on history of delta connector), but i am open to suggestoins

@findinpath findinpath self-requested a review November 14, 2023 14:47
Before the change, the Hive Glue metastore wiring would respect the
`hive.table-statistics-enabled` configuration property, but not the
`statistics_enabled` session property.

After the change, handling of stats for Hive Glue metastore is same as
for other metastores. That means in particular that some of the
`DefaultGlueColumnStatisticsProvider` code will execute implicitly, as
not all stats-related operations are guarded by `statistics_enabled`
toggle today (see e.g. `SemiTransactionalHiveMetastore` and implicit
stats updates after table creation or alteration). This is considered
not a blocker.
Delta Lake and Iceberg connectors do not use configuration from
`HiveConfig` and so the `HiveConfig`-provided configuration should not
be accepted when configuring the connector.
@findepi findepi force-pushed the findepi/delta-hive-config branch from 89e21a8 to 85049bc Compare November 14, 2023 16:17
@findepi findepi merged commit 67f04aa into master Nov 15, 2023
66 checks passed
@findepi findepi deleted the findepi/delta-hive-config branch November 15, 2023 08:45
@github-actions github-actions bot added this to the 434 milestone Nov 15, 2023
@mosabua
Copy link
Member

mosabua commented Nov 17, 2023

No user facing impact - so no release notes entry @findepi @losipiuk @findinpath ?

@losipiuk
Copy link
Member

No user facing impact - so no release notes entry @findepi @losipiuk @findinpath ?

IIUC previously users could set config properties defined in HiveConfig in .properties file for iceberg/delta catalogs. No they cannot.
But those properties would have been ignored anyway so why would anyone do so.

I do not feel release notes are strictly required - for you to decide.

@mosabua
Copy link
Member

mosabua commented Nov 17, 2023

okay ... to be on the safe side we might have to flag it as breaking change and add a release notes entry.. wdyt @martint

@mosabua
Copy link
Member

mosabua commented Nov 29, 2023

Added RN entry.

@covalesj
Copy link

Doesn't this break the capability for iceberg connector to write to s3? IE: how do you configure writes / credentials etc for iceberg->s3 writes?

@findinpath
Copy link
Contributor

how do you configure writes / credentials etc for iceberg-> s3 writes?

https://github.com/trinodb/trino/blob/79bc8fa09b337ac163582a708b725eb34cf36b0c/lib/trino-hdfs/src/main/java/io/trino/hdfs/s3/HiveS3Config.java (part of trino-hdfs module now)

Note that this change is about removing binding of HiveConfig for Delta & Iceberg connectors.

@mosabua
Copy link
Member

mosabua commented Dec 13, 2023

@findinpath we maybe should make sure the Iceberg connector docs links to the Hive S3 docs .. after checking that all properties apply..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

5 participants