-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators #7798
Conversation
|
||
/** | ||
* A struct of partition values. | ||
* | ||
* <p>Instances of this class can produce partition values from a data row passed to {@link | ||
* #partition(StructLike)}. | ||
*/ | ||
public class PartitionKey implements StructLike, Serializable { | ||
public class PartitionKey extends StructTransform { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of code has been moved to the non-public StructTransform
class so that it can be reused for the new SortKey
class
} | ||
|
||
private static class SortOrderComparator implements Comparator<StructLike> { | ||
private final SortKey leftKey; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
included SortOrderComparator
impl in this PR to demonstrate the usage of SortKey
3970b55
to
f93b012
Compare
f93b012
to
55123d4
Compare
55123d4
to
c824402
Compare
@stevenzwu: Extracting the partition key from the row is on the hot path of writes. Based on the code I do not see any obvious issues, but it might worth to check the performance implications of this change? WDYT? |
@pvary there is no change in the logic. simple refactor with a common base class extracted. so I don't expect performance implications |
Previous code had 1 less redirection/inheritance. Optimized loops can be a thing in this type of code paths, where the compiler could deduce that the different iterations of the loops are independent, and could be executed in batch. Adding more redirection/inheritance could break the optimization algorithm and instead of a parallel loop we can end up executing the loop sequentially. This could cause performance degradation. This is the only effect which might effect other users of the classes touched by this change, otherwise I think the change keeps the old behaviour correctly and introduces a new one, which we need. |
c824402
to
78606fc
Compare
import java.util.stream.Collectors; | ||
|
||
/** | ||
* A struct of flattened sort field values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand flattened in this context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's assume the table schema is like
Schema schema =
new Schema(
Types.NestedField.required(1, "id", Types.StringType.get()),
Types.NestedField.optional(
2,
"location",
Types.StructType.of(
Types.NestedField.required(11, "lat", Types.FloatType.get()),
Types.NestedField.required(12, "long", Types.FloatType.get()))));
if the SortKey is applied on id
and lat
, the StructTransform
only produces a flattened tuple. nested structure is not maintained anymore.
private final Object[] transformedTuple;
/** | ||
* A struct of flattened transformed values. | ||
* | ||
* <p>Instances of this class can produce sort values from a data row passed to {@link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo, can produce "transformed values"?
78606fc
to
b671d27
Compare
this.sortOrder = toCopy.sortOrder; | ||
} | ||
|
||
public SortKey copy() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the old method but we aren't actually copying here. Schema and Sort Order are not copies, just noting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right it is not deep copy of everything. but it is a still copy/clone
implementation of the SortKey
that we deemed correct.
Transform<?, ?> transform = fieldTransforms.get(i).transform(); | ||
Accessor<StructLike> accessor = schema.accessorForField(sourceFieldId); | ||
Preconditions.checkArgument( | ||
accessor != null, "Cannot build accessor for field: " + schema.findField(sourceFieldId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: checkArgs has a "condition, string, args" version so you don't have to do a string concat
|
||
/** | ||
* Simple POJO for source field id and transform function. {@code Pair} class is not usable here | ||
* in API module, as it has Avro dep and lives in core module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as it has a Avro dep and is in the core module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx. will update to as it has an Avro dep and is in the core module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
… SortOrderComparators (apache#7798)
… SortOrderComparators (apache#7798)
see more context from dev ML thread: https://www.mail-archive.com/[email protected]/msg04424.html