Skip to content

Commit

Permalink
Spec: Clarify missing fields when writing (apache#8672)
Browse files Browse the repository at this point in the history
* Spec: Carify missing fields when writing

Jan raised a point on slack of the symantic meaning of a field
that can be written:

https://apache-iceberg.slack.com/archives/C03LG1D563F/p1695834739711569

There are two options:

- The field is not part of the schema, and omitted from the file
- The field is part of the schema, but the value is not written (nullable)

My personal take on this is that we should use static schema's when
writing Avro files, so that all the fields that are either optional or
required are in the schema.

I'm well aware of that this doesn't impose any issues if you dogfood
the Iceberg Avro reader, where you can add required fields, for example
the `134: content` field in the manifest.

However, I think we should try to stick to the concept of write strict,
read permissive where we try to encourage people to write all the fields
that are in the spec (even they if the value itself is all null).

* Add manifest-list explicitly

Co-authored-by: JFinis <[email protected]>

* Update wording

* Comments

* Retain formatting

* Thanks Steven

---------

Co-authored-by: JFinis <[email protected]>
  • Loading branch information
2 people authored and Sasank Pagolu committed Oct 27, 2024
1 parent 59e63cd commit 24dcef4
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions format/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,12 +127,12 @@ Tables do not require rename, except for tables that use atomic rename to implem

#### Writer requirements

Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata files to a table with the given version.
Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata files (including manifests files and manifest lists) to a table with the given version.

| Requirement | Write behavior |
|-------------|----------------|
| (blank) | The field should be omitted |
| _optional_ | The field can be written |
| _optional_ | The field can be written or omitted |
| _required_ | The field must be written |

Readers should be more permissive because v1 metadata files are allowed in v2 tables so that tables can be upgraded to v2 without rewriting the metadata tree. For manifest list and manifest files, this table shows the expected v2 read behavior:
Expand Down

0 comments on commit 24dcef4

Please sign in to comment.