Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Spark UI metrics from Iceberg scan metrics #8717

Merged
merged 4 commits into from
Nov 3, 2023

Conversation

karuppayya
Copy link
Contributor

@github-actions github-actions bot added the spark label Oct 4, 2023
@aokolnychyi
Copy link
Contributor

Let me take a look.

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks almost good, I had some minor comments and it seems like we are missing the implementation and tests for a few total counters. @karuppayya, could you check if I got everything correctly?


@Override
public String description() {
return "total delete file size";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include (bytes) at the end like we do in TotalFileSize?

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks almost ready to go. I had a few minor points that would be nice to fix.

@@ -200,12 +220,31 @@ public CustomTaskMetric[] reportDriverMetrics() {
}

List<CustomTaskMetric> driverMetrics = Lists.newArrayList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Can we add an empty line after this one and before // common?

@@ -215,12 +254,32 @@ public CustomMetric[] supportedCustomMetrics() {
return new CustomMetric[] {
new NumSplits(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Can we add // task metrics before this line?
Both NumSplits and NumDeletes are populated at the task level.

@@ -200,12 +220,31 @@ public CustomTaskMetric[] reportDriverMetrics() {
}

List<CustomTaskMetric> driverMetrics = Lists.newArrayList();
// common
driverMetrics.add(TaskTotalFileSize.from(scanReport));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you confirm TaskTotalFileSize represents the total size of read data?

If so, shall we call these metrics as TaskTotalDataFileSize and TotalDataFileSize? I know we follow the API from core but it seems a bit confusing. I had to look up the code to understand what this metric means. If we decide to rename, let's move it to the data files block below.


@Override
public String description() {
return "total delete file size in bytes";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's follow what we have in TotalFileSize where use ... (bytes) instead of ... in bytes.

@karuppayya
Copy link
Contributor Author

Thanks @aokolnychyi for the review, i have addressed the latest comments, ready for another round

@aokolnychyi aokolnychyi merged commit a445925 into apache:main Nov 3, 2023
35 checks passed
@aokolnychyi
Copy link
Contributor

Thanks, @karuppayya!

aokolnychyi pushed a commit that referenced this pull request Nov 9, 2023
This change cherry-picks PR #8717 to Spark 3.4.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants