-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use compatible column name to set Parquet bloom filter #11799
base: main
Are you sure you want to change the base?
Conversation
cc @szehon-ho |
.columnBloomFilterEnabled() | ||
.forEach( | ||
(colPath, isEnabled) -> { | ||
Types.NestedField fieldId = schema.caseInsensitiveFindField(colPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[doubt] does sensitivity matters ? can this :
write.parquet.bloom-filter-enabled.column.CoL1
be applied to parquet files with schema containing col1
?
if not should we explicitly do lowercase post deriving the configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sensitivity matters. I changed to findField
. Thanks!
|
||
String parquetColumnPath = fieldIdToParquetPath.get(fieldId.fieldId()); | ||
if (parquetColumnPath == null) { | ||
LOG.warn("Skipping bloom filter config for missing field: {}", fieldId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we update this message to say something like
LOG.warn("Skipping bloom filter config for missing field: {}", fieldId); | |
LOG.warn("Skipping bloom filter config for field: {} due to missing parquetColumnPath for fieldId: {}", colPath, fiedId); |
mostly coming from the above log lines are identical mostly though at one part we add columnPath and the other we do fielId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks!
context | ||
.columnBloomFilterEnabled() | ||
.forEach( | ||
(colPath, isEnabled) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] do we need to do anything for isEnabled as false ? or can parquet pro-actively decide if it should have a BF for a column and this isEnabled as false serves as explicit deny ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If isEnable is true, iceberg will call withBloomFilterEnabled(String columnPath, boolean enabled). If isEnable is false, we don't need to do anything.
.columnBloomFilterEnabled() | ||
.forEach( | ||
(colPath, isEnabled) -> { | ||
Types.NestedField fieldId = schema.findField(colPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call this as field instead ?
Types.NestedField fieldId = schema.findField(colPath); | |
Types.NestedField field = schema.findField(colPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thank you @huaxingao !
When writing a Parquet file, if a column name contains special characters, e.g.
-
, Iceberg converts it to a compatible format. However, the bloom filter is still set using the original column name, which results in an invalid bloom filter. This pull request resolves the issue by setting the bloom filter with the compatible column name instead of the original one.