Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec: Minor modifications for v3 #10948

Merged
merged 5 commits into from
Aug 22, 2024
Merged

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Aug 15, 2024

This adds some minor changes to prepare for v3 spec work:

  • Adds a section to summarize the high-level v3 changes
  • Reduces some heading levels to leave room for better organization. The previous levels were set to reduce the size of the TOC in the Iceberg site 2 frameworks ago
  • Clarifies the v3 requirement for handling partition specs with unknown transforms. Writers cannot write and readers must ignore the fields

@rdblue rdblue requested review from Fokko and danielcweeks August 15, 2024 23:42
@github-actions github-actions bot added the Specification Issues that may introduce spec changes. label Aug 15, 2024
@rdblue rdblue requested a review from RussellSpitzer August 15, 2024 23:42
format/spec.md Outdated Show resolved Hide resolved
format/spec.md Outdated Show resolved Hide resolved
* Adding new supported type promotion cases, integer and long to string
* Adding default value support to struct fields
* Adding multi-argument transforms for partitioning and sorting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and specifies forwards compatibility for unknown transforms?

Copy link
Contributor

@Fokko Fokko Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a reader doesn't recognize the transform, it will ignore it during query planning. For writing, an exception will be raised. Since this is a high-level summary, should we include these details?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is worth including in the summary, but we could add that. To me, this is supposed to be high level and changes like that are a bit too in depth.


#### Terms
### Terms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so style here is to skip "##" to avoid a table of contents entry? I wonder if it is worth a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid this being at the same level as things like Schemas and data types further down. These aren't major sections of the spec. That's why I thought h3 rather than h2 was the right choice. I think we'll probably need to adjust these when we see what the site generates for the TOC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I don't have a strong opinion, but we might want to leave a comment if possible so future readers don't try to correct what they might see as a mistake?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOC is not generated now for spec and its sub section. It only covers first H1 and its sub topics.
image

Without the table of contents for spec and its sub topics it is hard to navigate.
I have added a detailed description and fixed the same in #11067

@@ -193,16 +204,14 @@ Supported primitive types are defined in the table below. Primitive types added

Notes:

1. Decimal scale is fixed and cannot be changed by schema evolution. Precision can only be widened.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being removed because we intend to change it in V3? Maybe this should be included in the summary?

Copy link
Contributor Author

@rdblue rdblue Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cleanup to make things more consistent. For decimal type promotion specifically, I removed this so that the section on schema evolution and type promotion is the source of truth. There shouldn't be two places to update so when I saw this was duplicative, I thought I should address it.

Similarly for timestamps, the type definition has the precision and there is no need to add it to the notes. I also clarified time for consistency.

format/spec.md Outdated Show resolved Hide resolved
Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor comments, and questions about process.

@@ -44,6 +44,15 @@ The primary change in version 2 adds delete files to encode rows that are delete

In addition to row-level deletes, version 2 makes some requirements stricter for writers. The full set of changes are listed in [Appendix E](#version-2).

#### Version 3: Extended Types and Capabilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is quite a bit of overlap with https://github.com/apache/iceberg/blob/main/format/spec.md#version-3. Looking at the questions that @emkornfield raised, it feels that this summary is too high level, maybe we should refer to Appendix E for details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main worry is that we'll have a summary with no actual information. I think this is about the right amount since it is a high-level addition. The appendix is intended to be much more detailed.

format/spec.md Outdated Show resolved Hide resolved
Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all good with these changes but I think it may be best to avoid having the summary list things which have not actually been merged into the spec at this moment. Ie for types just list Int96 for now and skip variant/unknown.

@rdblue rdblue force-pushed the spec-v3-minor-changes branch from 72fd08e to 7576939 Compare August 16, 2024 23:16
@rdblue
Copy link
Contributor Author

rdblue commented Aug 19, 2024

Thanks for reviewing, everyone. I've opened a thread on the dev list to merge this.

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add a comment on the jump in header levels

@rdblue rdblue merged commit ac0d206 into apache:main Aug 22, 2024
2 checks passed
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Specification Issues that may introduce spec changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants