-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark: Ensure that partition stats files are considered for GC procedures #9284
Conversation
*/ | ||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't use the Predicate<StatisticsFile>
for partition stats. Hence, deprecated and introduced snapshot id based input like other methods in this class (manifest list, manifest etc)
@@ -475,18 +474,9 @@ public void testExpireSnapshotsWithStatisticFiles() throws Exception { | |||
Assertions.assertThat(output.get(0)[5]).as("should be 1 deleted statistics file").isEqualTo(1L); | |||
|
|||
table.refresh(); | |||
List<StatisticsFile> statsWithSnapshotId1 = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are checking containsExactly
at line 482, no need of these checks.
bf5a896
to
d3e75e0
Compare
@@ -148,12 +148,49 @@ public static List<String> statisticsFilesLocations(Table table) { | |||
* @param table table for which statistics files needs to be listed | |||
* @param predicate predicate for filtering the statistics files | |||
* @return the location of statistics files | |||
* @deprecated use the {@code statisticsFilesLocationsForSnapshots(table, snapshotIds)} instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add a note on when it was deprecated (1.5.0) and when it will be removed (1.6.0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added this already in previous commit.
* statistics files for all the snapshots will be returned. | ||
* @return the location of statistics files | ||
*/ | ||
public static List<String> statisticsFilesLocationsForSnapshots( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about calling it statisticsFilesLocations
to match manifestListLocations
that also accepts a set of snapshot IDs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deprecated method already has a name statisticsFilesLocations
. So, If I keep the same name, compiler says ambiguous error when I call statisticsFilesLocationsForSnapshots(table, null)
Hence, I had to keep different name.
@ajantha-bhat, this looks mostly good to me. Can we add the tests to all Spark versions to be safe? |
Sure. I didn't initially add to other spark version thinking if I get any comments for test, I have to rework in all the spark version. So, I thought of adding tests in a single follow up PR for other spark versions. Now that we don't have any specific comment on test, I can add it in this PR itself. |
@aokolnychyi: Thanks for the review. Build passed and PR is ready. |
ping @aokolnychyi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me, I had one optional comment. Let me know what you want to do there, @ajantha-bhat.
@aokolnychyi : Are you referring to #9284 (comment) comment? I have replied why I can't change the name due to compile time ambiguity of same signature. |
Oops, I probably forgot to post the comment. I was thinking about |
The new method |
Yep, that's fair. Let's keep it as is then. |
Thanks, @ajantha-bhat! |
Expire snapshots and remove orphan files action/procedure should also consider partition stats files.
Fixes #9336