Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Don't reset snapshotLog in TableMetadata.removeRef method #11779

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ebyhr
Copy link
Contributor

@ebyhr ebyhr commented Dec 13, 2024

Fixes #11777

@github-actions github-actions bot added the core label Dec 13, 2024
@ebyhr ebyhr force-pushed the ebi/rest-replace-table-snapshot-log branch 3 times, most recently from 81647f9 to 131b64e Compare December 13, 2024 11:14
@ebyhr ebyhr force-pushed the ebi/rest-replace-table-snapshot-log branch 2 times, most recently from 57a1ec2 to d84eb2c Compare December 13, 2024 22:52
@Fokko Fokko added this to the Iceberg 1.7.2 milestone Dec 19, 2024
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr I think we may want to take a different approach.

Currently, we clear the snapshot log when TableMetadata#removeRef is called here . If you see, the non-REST case goes through resetMain which won't clear the snapshot log but for the REST changes that are produced, we produce a RemoveSnapshotRef update type which when applied server side clears the log.

Quite simply, I think we probably just want to remove the snapshotLog.clear() call in the removeRef API. In the past, I think there was concern that a user using that API may somehow rely on that behavior but really the only action on an Iceberg table that should remove history is snapshot expiration. removeRef(MAIN) should not really change history in anyway, this was probably just an oversight in the implementation done a long time ago.

I also discussed this issue with a few others @Fokko @nastra @rdblue @danielcweeks and I think we're on the same page there.

What are your thoughts? If you agree, would be happy to review the PR with the change to stop clearing the snapshot log, and some tests to verify that history is preserved during a REPLACE, and we can get a fix for this issue in!

@ebyhr ebyhr force-pushed the ebi/rest-replace-table-snapshot-log branch from d84eb2c to 9c2ca14 Compare December 23, 2024 02:28
@ebyhr ebyhr force-pushed the ebi/rest-replace-table-snapshot-log branch from 9c2ca14 to 4e55282 Compare December 23, 2024 02:28
@ebyhr ebyhr changed the title REST: Don't reset snapshotLog when replacing table Core: Don't reset snapshotLog in TableMetadata.removeRef method Dec 23, 2024
@ebyhr
Copy link
Contributor Author

ebyhr commented Dec 23, 2024

@amogh-jahagirdar There is no reason to disagree from my perspective. I was simply trying to keep the existing behavior as much as possible because I didn't know the historical reason. Reverted most changes and removed snapshotLog.clear() call with some tests.

@@ -2418,6 +2421,34 @@ public void testPaginationForListTables(int numberOfItems) {
eq(ListTablesResponse.class));
}

@Test
public void testReplaceTableKeepsSnapshotLog() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we put this test in CatalogTests? That'll cover both REST + non-REST cases (TestRESTCatalog extends CatalogTests).

table.refresh();
assertThat(((BaseTable) table).operations().current().snapshotLog())
.hasSize(2)
.containsAll(snapshotLogBeforeReplace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we assert the exact snapshot log contents instead of just doing a containsAll check? We should have both the log before the replace and the new snapshot produced as part of the replace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REST catalog doesn't return old history if we execute CREATE OR REPLACE TABLE statement
4 participants