-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec: Clarify missing fields when writing #8672
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Jan raised a point on slack of the symantic meaning of a field that can be written: https://apache-iceberg.slack.com/archives/C03LG1D563F/p1695834739711569 There are two options: - The field is not part of the schema, and omitted from the file - The field is part of the schema, but the value is not written (nullable) My personal take on this is that we should use static schema's when writing Avro files, so that all the fields that are either optional or required are in the schema. I'm well aware of that this doesn't impose any issues if you dogfood the Iceberg Avro reader, where you can add required fields, for example the `134: content` field in the manifest. However, I think we should try to stick to the concept of write strict, read permissive where we try to encourage people to write all the fields that are in the spec (even they if the value itself is all null).
nit: Typo in PR description |
JFinis
reviewed
Sep 28, 2023
Co-authored-by: JFinis <[email protected]>
Fokko
changed the title
Spec: Carify missing fields when writing
Spec: Clarify missing fields when writing
Sep 28, 2023
rdblue
reviewed
Oct 5, 2023
rdblue
reviewed
Oct 5, 2023
rdblue
reviewed
Oct 5, 2023
ajantha-bhat
reviewed
Oct 9, 2023
ajantha-bhat
approved these changes
Oct 9, 2023
+1 LGTM |
stevenzwu
reviewed
Apr 22, 2024
stevenzwu
approved these changes
Apr 23, 2024
I'll go ahead and merge this now. EVery now and then a question around this pops up on the devlist/Slack, so it is good to get this clarification out. Thanks everyone for chiming in for the review 👍 |
sasankpagolu
pushed a commit
to sasankpagolu/iceberg
that referenced
this pull request
Oct 27, 2024
* Spec: Carify missing fields when writing Jan raised a point on slack of the symantic meaning of a field that can be written: https://apache-iceberg.slack.com/archives/C03LG1D563F/p1695834739711569 There are two options: - The field is not part of the schema, and omitted from the file - The field is part of the schema, but the value is not written (nullable) My personal take on this is that we should use static schema's when writing Avro files, so that all the fields that are either optional or required are in the schema. I'm well aware of that this doesn't impose any issues if you dogfood the Iceberg Avro reader, where you can add required fields, for example the `134: content` field in the manifest. However, I think we should try to stick to the concept of write strict, read permissive where we try to encourage people to write all the fields that are in the spec (even they if the value itself is all null). * Add manifest-list explicitly Co-authored-by: JFinis <[email protected]> * Update wording * Comments * Retain formatting * Thanks Steven --------- Co-authored-by: JFinis <[email protected]>
zachdisc
pushed a commit
to zachdisc/iceberg
that referenced
this pull request
Dec 23, 2024
* Spec: Carify missing fields when writing Jan raised a point on slack of the symantic meaning of a field that can be written: https://apache-iceberg.slack.com/archives/C03LG1D563F/p1695834739711569 There are two options: - The field is not part of the schema, and omitted from the file - The field is part of the schema, but the value is not written (nullable) My personal take on this is that we should use static schema's when writing Avro files, so that all the fields that are either optional or required are in the schema. I'm well aware of that this doesn't impose any issues if you dogfood the Iceberg Avro reader, where you can add required fields, for example the `134: content` field in the manifest. However, I think we should try to stick to the concept of write strict, read permissive where we try to encourage people to write all the fields that are in the spec (even they if the value itself is all null). * Add manifest-list explicitly Co-authored-by: JFinis <[email protected]> * Update wording * Comments * Retain formatting * Thanks Steven --------- Co-authored-by: JFinis <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Jan raised a point on Slack of the semantic meaning of a field that can be written:
https://apache-iceberg.slack.com/archives/C03LG1D563F/p1695834739711569
There are two options:
My personal take on this is that we should use static schema when writing Avro files so that all the fields that are either optional or required are in the schema.
I'm well aware that this doesn't impose any issues if you dogfood the Iceberg Avro reader, where you can add required fields, for example, the
134: content
field in the manifest.However, I think we should try to stick to the concept of writing strictly, read permissive where we try to encourage people to write all the fields that are in the spec (even if the value itself is all null).