From e3bc87c2e6c6942beebee000fb69c54a7c1ee3b6 Mon Sep 17 00:00:00 2001 From: Fokko Driesprong Date: Thu, 28 Sep 2023 10:11:30 +0200 Subject: [PATCH 1/6] Spec: Carify missing fields when writing Jan raised a point on slack of the symantic meaning of a field that can be written: https://apache-iceberg.slack.com/archives/C03LG1D563F/p1695834739711569 There are two options: - The field is not part of the schema, and omitted from the file - The field is part of the schema, but the value is not written (nullable) My personal take on this is that we should use static schema's when writing Avro files, so that all the fields that are either optional or required are in the schema. I'm well aware of that this doesn't impose any issues if you dogfood the Iceberg Avro reader, where you can add required fields, for example the `134: content` field in the manifest. However, I think we should try to stick to the concept of write strict, read permissive where we try to encourage people to write all the fields that are in the spec (even they if the value itself is all null). --- format/spec.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/format/spec.md b/format/spec.md index 01903393f88f..4d2d547efbd3 100644 --- a/format/spec.md +++ b/format/spec.md @@ -128,13 +128,13 @@ Tables do not require rename, except for tables that use atomic rename to implem #### Writer requirements -Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata files to a table with the given version. +Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata/manifest files to a table with the given version. -| Requirement | Write behavior | -|-------------|----------------| -| (blank) | The field should be omitted | -| _optional_ | The field can be written | -| _required_ | The field must be written | +| Requirement | Write behavior | +|-------------|-------------------------------------------------------| +| (blank) | The field should not be present in the schema | +| _optional_ | The field should be in the schema, and can be written | +| _required_ | The field should in the schema and must be written | Readers should be more permissive because v1 metadata files are allowed in v2 tables so that tables can be upgraded to v2 without rewriting the metadata tree. For manifest list and manifest files, this table shows the expected v2 read behavior: From 33271e783072a47fb956ef682dabc2ff9ef757c3 Mon Sep 17 00:00:00 2001 From: Fokko Driesprong Date: Thu, 28 Sep 2023 13:00:29 +0200 Subject: [PATCH 2/6] Add manifest-list explicitly Co-authored-by: JFinis --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index 4d2d547efbd3..544255dadbac 100644 --- a/format/spec.md +++ b/format/spec.md @@ -128,7 +128,7 @@ Tables do not require rename, except for tables that use atomic rename to implem #### Writer requirements -Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata/manifest files to a table with the given version. +Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata, manifest files, or manifest lists to a table with the given version. | Requirement | Write behavior | |-------------|-------------------------------------------------------| From d13491bc4c347f452dd61dcca16db0b8db72238e Mon Sep 17 00:00:00 2001 From: Fokko Driesprong Date: Thu, 28 Sep 2023 13:28:45 +0200 Subject: [PATCH 3/6] Update wording --- format/spec.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/format/spec.md b/format/spec.md index 544255dadbac..1da027e731da 100644 --- a/format/spec.md +++ b/format/spec.md @@ -130,11 +130,11 @@ Tables do not require rename, except for tables that use atomic rename to implem Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata, manifest files, or manifest lists to a table with the given version. -| Requirement | Write behavior | -|-------------|-------------------------------------------------------| -| (blank) | The field should not be present in the schema | -| _optional_ | The field should be in the schema, and can be written | -| _required_ | The field should in the schema and must be written | +| Requirement | Write behavior | +|-------------|-----------------------------------------------------| +| (blank) | The field is not present in the schema | +| _optional_ | The field is part of the schema, and can be written | +| _required_ | The field is part of schema and must be written | Readers should be more permissive because v1 metadata files are allowed in v2 tables so that tables can be upgraded to v2 without rewriting the metadata tree. For manifest list and manifest files, this table shows the expected v2 read behavior: From 00d1f73feec37baea1efc6d88d48f0afb465ea88 Mon Sep 17 00:00:00 2001 From: Fokko Driesprong Date: Sun, 8 Oct 2023 23:28:18 +0200 Subject: [PATCH 4/6] Comments --- format/spec.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/format/spec.md b/format/spec.md index 1da027e731da..90e5348ae659 100644 --- a/format/spec.md +++ b/format/spec.md @@ -128,13 +128,13 @@ Tables do not require rename, except for tables that use atomic rename to implem #### Writer requirements -Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata, manifest files, or manifest lists to a table with the given version. +Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata (including manifests files and manifest lists) files to a table with the given version. -| Requirement | Write behavior | -|-------------|-----------------------------------------------------| -| (blank) | The field is not present in the schema | -| _optional_ | The field is part of the schema, and can be written | -| _required_ | The field is part of schema and must be written | +| Requirement | Write behavior | +|-------------|-------------------------------------| +| (blank) | The field should be omitted | +| _optional_ | The field can be written or omitted | +| _required_ | The field must be written | Readers should be more permissive because v1 metadata files are allowed in v2 tables so that tables can be upgraded to v2 without rewriting the metadata tree. For manifest list and manifest files, this table shows the expected v2 read behavior: From 340447b6815bb96d36e85115a3eeab8e2fa69b8d Mon Sep 17 00:00:00 2001 From: Fokko Driesprong Date: Fri, 27 Oct 2023 23:02:27 +0200 Subject: [PATCH 5/6] Retain formatting --- format/spec.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/format/spec.md b/format/spec.md index 90e5348ae659..d19a6516b6d3 100644 --- a/format/spec.md +++ b/format/spec.md @@ -130,11 +130,11 @@ Tables do not require rename, except for tables that use atomic rename to implem Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata (including manifests files and manifest lists) files to a table with the given version. -| Requirement | Write behavior | -|-------------|-------------------------------------| -| (blank) | The field should be omitted | +| Requirement | Write behavior | +|-------------|----------------| +| (blank) | The field should be omitted | | _optional_ | The field can be written or omitted | -| _required_ | The field must be written | +| _required_ | The field must be written | Readers should be more permissive because v1 metadata files are allowed in v2 tables so that tables can be upgraded to v2 without rewriting the metadata tree. For manifest list and manifest files, this table shows the expected v2 read behavior: From 373f8b37c7fde7891ebcce8e1a31555d532df4cd Mon Sep 17 00:00:00 2001 From: Fokko Driesprong Date: Tue, 23 Apr 2024 11:56:05 +0200 Subject: [PATCH 6/6] Thanks Steven --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index d19a6516b6d3..8bdd01460adf 100644 --- a/format/spec.md +++ b/format/spec.md @@ -128,7 +128,7 @@ Tables do not require rename, except for tables that use atomic rename to implem #### Writer requirements -Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata (including manifests files and manifest lists) files to a table with the given version. +Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata files (including manifests files and manifest lists) to a table with the given version. | Requirement | Write behavior | |-------------|----------------|