-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Spark UI metrics from Iceberg scan metrics #8717
Conversation
Let me take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks almost good, I had some minor comments and it seems like we are missing the implementation and tests for a few total counters. @karuppayya, could you check if I got everything correctly?
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java
Show resolved
Hide resolved
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java
Show resolved
Hide resolved
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ResultDeleteFiles.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalDataManifests.java
Show resolved
Hide resolved
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalDeleteFileSize.java
Show resolved
Hide resolved
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalDeleteFileSize.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public String description() { | ||
return "total delete file size"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include (bytes)
at the end like we do in TotalFileSize
?
068b9ae
to
0a4c2c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks almost ready to go. I had a few minor points that would be nice to fix.
@@ -200,12 +220,31 @@ public CustomTaskMetric[] reportDriverMetrics() { | |||
} | |||
|
|||
List<CustomTaskMetric> driverMetrics = Lists.newArrayList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Can we add an empty line after this one and before // common
?
@@ -215,12 +254,32 @@ public CustomMetric[] supportedCustomMetrics() { | |||
return new CustomMetric[] { | |||
new NumSplits(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Can we add // task metrics
before this line?
Both NumSplits
and NumDeletes
are populated at the task level.
@@ -200,12 +220,31 @@ public CustomTaskMetric[] reportDriverMetrics() { | |||
} | |||
|
|||
List<CustomTaskMetric> driverMetrics = Lists.newArrayList(); | |||
// common | |||
driverMetrics.add(TaskTotalFileSize.from(scanReport)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you confirm TaskTotalFileSize
represents the total size of read data?
If so, shall we call these metrics as TaskTotalDataFileSize
and TotalDataFileSize
? I know we follow the API from core but it seems a bit confusing. I had to look up the code to understand what this metric means. If we decide to rename, let's move it to the data files block below.
|
||
@Override | ||
public String description() { | ||
return "total delete file size in bytes"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's follow what we have in TotalFileSize
where use ... (bytes)
instead of ... in bytes
.
Thanks @aokolnychyi for the review, i have addressed the latest comments, ready for another round |
Thanks, @karuppayya! |
This change cherry-picks PR #8717 to Spark 3.4.
This is a followup to #7447 (comment)
cc: @aokolnychyi @RussellSpitzer @szehon-ho