-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-6961][VL][feat] Add decimal write support for ArrowWritableColumnVector #6962
Conversation
throw new UnsupportedOperationException(); | ||
} | ||
// Arrow not need to setNotNull, set the valus is enough. | ||
void setNotNull(int rowId) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason of the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use MutableProjection
, it will call MutableColumnarBatchRow
, we can construct MutableColumnarBatchRow
by ArrowWritableColumnVector
. https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/MutableColumnarRow.java#L286
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class is abstract and the method should be overridable by impl classes.
See
Line 1913 in ed3d333
writer.setIndexDefined(rowId); |
And abstract class (needs more design)ArrowVectorWriter
doesn't likely have to be abstract. Could make it an interface as it almost doesn't override anything from the hierarchy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it has member ValueVector
,
Line 1244 in ed3d333
private final ValueVector vector; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class should not be abstract since its subclass will not implement the functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we route putDecimal
calls to writer.setDecimal
? I am not that familiar with the design here but looks to be a manner we should follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writer does not have the setDecimal
function, I copied most code from https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java#L429 but is different in putByteArray, because Arrow DecimalVector does not support put byte array range.
Put the all bytes is final function in Spark.
public abstract int putByteArray(int rowId, byte[] value, int offset, int count);
public final int putByteArray(int rowId, byte[] value) {
return putByteArray(rowId, value, 0, value.length);
}
I will add a test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will override putByteArray but check if it put all bytes, I will refactor it.
def checkSchema(schema: StructType): Boolean = { | ||
try { | ||
SparkSchemaUtil.toArrowSchema(schema) | ||
true | ||
} catch { | ||
case _: Exception => | ||
false | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find here will throw exception, https://github.com/apache/incubator-gluten/blob/main/gluten-data/src/main/scala/org/apache/spark/sql/utils/SparkArrowUtil.scala#L54
But it will not run to here actually, I will drop the change.
Anymore comments? @zhztheplayer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinchengchenghh Would you help fill in the PR description? Thanks.
Just some nits, I updated the PR to save some time. Thanks. |
===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====
|
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
No description provided.