Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DP-3392: Option to lowercase field names #64

Merged
merged 4 commits into from
May 22, 2024

Conversation

nicmart
Copy link
Contributor

@nicmart nicmart commented May 17, 2024

Add a new boolean option connect.ems.convert.lowercase.fields, that defaults to false.

When set to true, all field names in the produced in the parquet will be lowercase. This will be useful in those cases where JSON input data is not clean, and field names are not always given in the same case.

Details

The conversion has been implemented as a feature of our RecursiveConverter, which now accepts an additional functional parameter to transform field names.

RecursiveConverter, and therefore the lowercase transformation when set, runs as a PreConversion, which means that the conversion will run at the beginning of RecordTransformer, before flattening and before schema evolution.

@@ -46,10 +47,12 @@ final class RecursiveConversion(innerConversion: ConnectConversion) extends Conn
connectValue match {
case connectValue: Struct =>
val newStruct = new Struct(targetSchema)
targetSchema.fields().asScala.foreach { field =>
originalSchema.fields().asScala.foreach { field =>
Copy link
Contributor

@afiore afiore May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticing that we now iterate on the original rather than the target schema. I guess this is necessary as we need to supply both the original and the new field schema to convertValue...

@nicmart nicmart merged commit 81156dc into main May 22, 2024
11 checks passed
@nicmart nicmart deleted the story/dp-3392-lowercase-fields branch May 22, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants