Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec: Clarify missing fields when writing #8672

Merged
merged 6 commits into from
Apr 26, 2024
Merged

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Sep 28, 2023

Jan raised a point on Slack of the semantic meaning of a field that can be written:

https://apache-iceberg.slack.com/archives/C03LG1D563F/p1695834739711569

There are two options:

  • The field is not part of the schema, and is omitted from the file
  • The field is part of the schema, but the value is not written (nullable)

My personal take on this is that we should use static schema when writing Avro files so that all the fields that are either optional or required are in the schema.

I'm well aware that this doesn't impose any issues if you dogfood the Iceberg Avro reader, where you can add required fields, for example, the 134: content field in the manifest.

However, I think we should try to stick to the concept of writing strictly, read permissive where we try to encourage people to write all the fields that are in the spec (even if the value itself is all null).

Jan raised a point on slack of the symantic meaning of a field
that can be written:

https://apache-iceberg.slack.com/archives/C03LG1D563F/p1695834739711569

There are two options:

- The field is not part of the schema, and omitted from the file
- The field is part of the schema, but the value is not written (nullable)

My personal take on this is that we should use static schema's when
writing Avro files, so that all the fields that are either optional or
required are in the schema.

I'm well aware of that this doesn't impose any issues if you dogfood
the Iceberg Avro reader, where you can add required fields, for example
the `134: content` field in the manifest.

However, I think we should try to stick to the concept of write strict,
read permissive where we try to encourage people to write all the fields
that are in the spec (even they if the value itself is all null).
format/spec.md Outdated Show resolved Hide resolved
format/spec.md Outdated Show resolved Hide resolved
format/spec.md Outdated Show resolved Hide resolved
@ajantha-bhat
Copy link
Member

nit: Typo in PR description Carify -> Clarify

format/spec.md Outdated Show resolved Hide resolved
format/spec.md Outdated Show resolved Hide resolved
format/spec.md Show resolved Hide resolved
@Fokko Fokko changed the title Spec: Carify missing fields when writing Spec: Clarify missing fields when writing Sep 28, 2023
@Fokko Fokko added the Specification Issues that may introduce spec changes. label Oct 2, 2023
format/spec.md Outdated Show resolved Hide resolved
format/spec.md Outdated Show resolved Hide resolved
format/spec.md Outdated Show resolved Hide resolved
format/spec.md Outdated Show resolved Hide resolved
@Fokko Fokko requested a review from rdblue April 22, 2024 19:22
@jzhuge
Copy link
Member

jzhuge commented Apr 22, 2024

+1 LGTM

format/spec.md Outdated Show resolved Hide resolved
@Fokko
Copy link
Contributor Author

Fokko commented Apr 26, 2024

I'll go ahead and merge this now. EVery now and then a question around this pops up on the devlist/Slack, so it is good to get this clarification out. Thanks everyone for chiming in for the review 👍

@Fokko Fokko merged commit 5821efc into apache:main Apr 26, 2024
2 checks passed
@Fokko Fokko deleted the fd-clarify-spec branch April 26, 2024 06:50
sasankpagolu pushed a commit to sasankpagolu/iceberg that referenced this pull request Oct 27, 2024
* Spec: Carify missing fields when writing

Jan raised a point on slack of the symantic meaning of a field
that can be written:

https://apache-iceberg.slack.com/archives/C03LG1D563F/p1695834739711569

There are two options:

- The field is not part of the schema, and omitted from the file
- The field is part of the schema, but the value is not written (nullable)

My personal take on this is that we should use static schema's when
writing Avro files, so that all the fields that are either optional or
required are in the schema.

I'm well aware of that this doesn't impose any issues if you dogfood
the Iceberg Avro reader, where you can add required fields, for example
the `134: content` field in the manifest.

However, I think we should try to stick to the concept of write strict,
read permissive where we try to encourage people to write all the fields
that are in the spec (even they if the value itself is all null).

* Add manifest-list explicitly

Co-authored-by: JFinis <[email protected]>

* Update wording

* Comments

* Retain formatting

* Thanks Steven

---------

Co-authored-by: JFinis <[email protected]>
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
* Spec: Carify missing fields when writing

Jan raised a point on slack of the symantic meaning of a field
that can be written:

https://apache-iceberg.slack.com/archives/C03LG1D563F/p1695834739711569

There are two options:

- The field is not part of the schema, and omitted from the file
- The field is part of the schema, but the value is not written (nullable)

My personal take on this is that we should use static schema's when
writing Avro files, so that all the fields that are either optional or
required are in the schema.

I'm well aware of that this doesn't impose any issues if you dogfood
the Iceberg Avro reader, where you can add required fields, for example
the `134: content` field in the manifest.

However, I think we should try to stick to the concept of write strict,
read permissive where we try to encourage people to write all the fields
that are in the spec (even they if the value itself is all null).

* Add manifest-list explicitly

Co-authored-by: JFinis <[email protected]>

* Update wording

* Comments

* Retain formatting

* Thanks Steven

---------

Co-authored-by: JFinis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Specification Issues that may introduce spec changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants