Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core, Spark: Avoid manifest copies when importing data to V2 tables #8962

Merged
merged 2 commits into from
Dec 13, 2023

Conversation

aokolnychyi
Copy link
Contributor

This PR extends the idea from #8928 to FastAppend and MergeAppend, which are used in data imports and migration.

@aokolnychyi aokolnychyi force-pushed the avoid-manifest-copies-append branch from a9ca63f to 5780b82 Compare November 1, 2023 18:54
@github-actions github-actions bot added the API label Nov 1, 2023
* <p>By default, the manifest will be rewritten to assign all entries this update's snapshot ID.
* In that case, it is always the responsibility of the caller to manage the lifecycle of the
* original manifest.
* <p>The manifest will be used directly if snapshot ID inheritance is enabled (all tables with
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply clarifies the notion of snapshot ID inheritance and that it is always on in V2 tables.

@aokolnychyi aokolnychyi force-pushed the avoid-manifest-copies-append branch from 5780b82 to 6d9d78b Compare November 1, 2023 19:01
@github-actions github-actions bot added the docs label Nov 1, 2023
@aokolnychyi aokolnychyi force-pushed the avoid-manifest-copies-append branch 2 times, most recently from fea275d to 9b2b93a Compare November 1, 2023 19:29
Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, minor comment

* the commit fails, the manifest will never be deleted and it is up to the caller whether to
* delete or reuse it.
* <p>If the manifest is rewritten, it is always the responsibility of the caller to manage the
* lifecycle of the original manifest. If manifest entries are allowed to inherit the snapshot ID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think to simplify 'if manifest entries are allowed to inherit the snapshot ID...' to 'if the manifest is used directly'..?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I liked it. Fixed.

@aokolnychyi aokolnychyi force-pushed the avoid-manifest-copies-append branch from 2b5eed9 to 7f6a798 Compare December 12, 2023 22:25
@aokolnychyi aokolnychyi merged commit d631e2c into apache:main Dec 13, 2023
46 checks passed
@aokolnychyi
Copy link
Contributor Author

Thanks, @szehon-ho!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants