-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Iceberg partitioned writes #2842
Conversation
CodSpeed Performance ReportMerging #2842 will degrade performances by 46.85%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2842 +/- ##
==========================================
+ Coverage 66.01% 66.29% +0.28%
==========================================
Files 1001 1003 +2
Lines 113467 113616 +149
==========================================
+ Hits 74906 75325 +419
+ Misses 38561 38291 -270
Flags with carried forward coverage won't be shown. Click here to find out more.
|
read_back = daft.read_iceberg(table) | ||
assert as_arrow == read_back.to_arrow() | ||
assert all(op == "ADD" for op in as_dict["operation"]), as_dict["operation"] | ||
assert sum(as_dict["rows"]) == 5, as_dict["rows"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is there an assertion we should make here on the number of files that were written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
This would actually deviate from the behavior from spark and cause incorrect hive style reads! Maybe we can add an alias when writing out partitioning columns |
I changed this back actually. sorry I forgot to update the description. However this is maybe an interesting insight. Should our iceberg partitioned writes write out paths with modified names such as |
This also changes the behavior of some of the partitioning functions in ways that I would consider as bug fixes. However @samster25 maybe you should take a look at them to make sure their new behavior is correct. The changes:
{}_trunc
instead of{}_truncate
to match Spark behavior