Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DeleteManifest] Making file validation optional #10535

Open
haizhou-zhao opened this issue Jun 18, 2024 · 10 comments
Open

[DeleteManifest] Making file validation optional #10535

haizhou-zhao opened this issue Jun 18, 2024 · 10 comments
Labels
improvement PR that improves existing functionality stale

Comments

@haizhou-zhao
Copy link
Contributor

Feature Request / Improvement

Background

This issue comes up when we are trying to fix a corrupted table where one manifest file (with active reference to data files) was (unintentionally) removed from the file system. At that point, our only way out was to remove the reference to that manifest file so that queries on the table does not fail with NotFoundException. This are the APIs we executed to achieve that:

table.rewriteManifests().deleteManifest(manifest).commit()

However, we found that Manifest rewrite implementation will force consistency on file counts, which means the above line of code will fail due to the following validation:

Feature Request

Although keeping file counts consistent with manifest rewrite serves most cases, in corner cases like fixing a corrupted table with manifest file deleted, user does intentionally want to lose all the data files referred by that missing manifest file (so that query against the table could start to work, and they can backfill the lost data later). We'd like to raise a feature request to make validateFilesCounts check optional on manifest rewrite, so that users have the choice to disable it by some configuration.

Query engine

None

@haizhou-zhao haizhou-zhao added the improvement PR that improves existing functionality label Jun 18, 2024
@haizhou-zhao
Copy link
Contributor Author

cc: @dramaticlly

@dramaticlly
Copy link
Contributor

@aokolnychyi @szehon-ho I am wondering what do you think.

Basically the problem we are trying to resolve is remove existing manifests from table and I hope to achieve this with extra configuration like forced() in table.rewriteManifests().deleteManifest(manifest).forced().commit() so that we can skip the validation on active file count.

The reason to remove existing manifest can vary, but for our use case we are trying to recover from failure in #8599 , where REST client prematurely removed manifests on network timeout but the commit actual did go through on catalog side. So table ends up in a corrupted state where manifest file is deleted on file system but tracked in iceberg metadata.

@szehon-ho
Copy link
Collaborator

I guess they initially made rewriteManifest as a safe API.

I wonder, does DeleteFiles API achieve this? The next commit will remove that manifest if all its contents are removed.

If we really need a specific API, it sounds like we need deleteManifest method there.

@szehon-ho
Copy link
Collaborator

btw, did you look at that case? I didnt follow but from #8599 supoosedly it only happens when its instance of CleanableException

@dramaticlly
Copy link
Contributor

thank you Szehon, I think DeleteFiles API only work with data files, delete files but not for manifest files. I tried to leverage delete by file path in DeleteFiles ->MergingSnapshotProducer but it does not work.

I agreed that deleteManifest sounds like a right way to do this instead of stub in rewriteManifests. I am aware of the fix 8599 above, it's just that some damage has occurred this time and we are looking forward to a way to fix corrupted table. There can be potentially other reasons for dropping a manifest from iceberg metadata, which is not relate to 8599.

@szehon-ho
Copy link
Collaborator

szehon-ho commented Jun 18, 2024

Sounds good. Im thinking adding deleteManifest method there is better, then.

I tried to leverage delete by file path in DeleteFiles ->MergingSnapshotProducer but it does not work.

Is it an issue we should fix, or expected behavior? Be good to fix that, if the former.

@szehon-ho
Copy link
Collaborator

szehon-ho commented Jun 18, 2024

Well, that was my thought as its parallel with appendManifest. But now I see #10396 try to deprecate that, so like to get @Fokko thought here. I think appendManifest/ deleteManifest are helpful, not sure the rationale to deprecate

@Fokko
Copy link
Contributor

Fokko commented Jun 25, 2024

@szehon-ho sorry for the late reply here. I closed my PR, and I've left a comment with an explanation: #10396 (comment) The main problem is keeping the snapshot summary up to date.

@szehon-ho
Copy link
Collaborator

Thanks @Fokko !

Copy link

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement PR that improves existing functionality stale
Projects
None yet
Development

No branches or pull requests

4 participants