[GLUTEN-6961][VL][feat] Add decimal write support for ArrowWritableColumnVector #6962

jinchengchenghh · 2024-08-21T09:05:19Z

No description provided.

github-actions · 2024-08-21T09:05:34Z

zhztheplayer · 2024-08-22T01:29:17Z

gluten-data/src/main/java/org/apache/gluten/vectorized/ArrowWritableColumnVector.java

-      throw new UnsupportedOperationException();
-    }
+    // Arrow not need to setNotNull, set the valus is enough.
+    void setNotNull(int rowId) {}


Any reason of the change?

If we use MutableProjection, it will call MutableColumnarBatchRow, we can construct MutableColumnarBatchRow by ArrowWritableColumnVector. https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/MutableColumnarRow.java#L286

The class is abstract and the method should be overridable by impl classes.

See

incubator-gluten/gluten-data/src/main/java/org/apache/gluten/vectorized/ArrowWritableColumnVector.java

Line 1913 in ed3d333

writer.setIndexDefined(rowId);

as example.

~~And abstract class ArrowVectorWriter doesn't likely have to be abstract. Could make it an interface as it almost doesn't override anything from the hierarchy.~~ (needs more design)

Because it has member ValueVector,

incubator-gluten/gluten-data/src/main/java/org/apache/gluten/vectorized/ArrowWritableColumnVector.java

Line 1244 in ed3d333

private final ValueVector vector;

The class should not be abstract since its subclass will not implement the functions.

Can we route putDecimal calls to writer.setDecimal? I am not that familiar with the design here but looks to be a manner we should follow

writer does not have the setDecimal function, I copied most code from https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java#L429 but is different in putByteArray, because Arrow DecimalVector does not support put byte array range.
Put the all bytes is final function in Spark.

public abstract int putByteArray(int rowId, byte[] value, int offset, int count); public final int putByteArray(int rowId, byte[] value) { return putByteArray(rowId, value, 0, value.length); }

I will add a test for it.

I will override putByteArray but check if it put all bytes, I will refactor it.

zhztheplayer · 2024-08-22T01:31:25Z

gluten-data/src/main/scala/org/apache/spark/sql/utils/SparkSchemaUtil.scala

+  def checkSchema(schema: StructType): Boolean = {
+    try {
+      SparkSchemaUtil.toArrowSchema(schema)
+      true
+    } catch {
+      case _: Exception =>
+        false
+    }
+  }


What's the error here?

I find here will throw exception, https://github.com/apache/incubator-gluten/blob/main/gluten-data/src/main/scala/org/apache/spark/sql/utils/SparkArrowUtil.scala#L54

But it will not run to here actually, I will drop the change.

…decimal

jinchengchenghh · 2024-08-28T05:31:47Z

Anymore comments? @zhztheplayer

zhztheplayer

@jinchengchenghh Would you help fill in the PR description? Thanks.

zhztheplayer · 2024-08-29T03:12:21Z

Anymore comments? @zhztheplayer

Just some nits, I updated the PR to save some time. Thanks.

GlutenPerfBot · 2024-08-29T18:08:08Z

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query	log/native_master_08_29_2024_time.csv	log/native_master_08_28_2024_3928dc2526_time.csv	difference	percentage
q1	13.88	14.02	0.140	101.01%
q2	13.25	13.32	0.077	100.58%
q3	4.36	3.86	-0.491	88.73%
q4	72.35	72.05	-0.301	99.58%
q5	8.05	9.26	1.213	115.07%
q6	2.25	2.36	0.110	104.91%
q7	6.53	6.72	0.188	102.88%
q8	4.87	5.08	0.215	104.43%
q9	24.26	24.18	-0.079	99.67%
q10	10.56	9.64	-0.918	91.30%
q11	39.02	39.48	0.460	101.18%
q12	1.53	1.40	-0.134	91.24%
q13	6.50	6.61	0.118	101.81%
q14a	47.24	46.08	-1.162	97.54%
q14b	43.24	42.22	-1.020	97.64%
q15	2.78	2.78	0.006	100.23%
q16	46.45	46.08	-0.371	99.20%
q17	5.50	5.39	-0.111	97.99%
q18	8.00	6.91	-1.090	86.37%
q19	2.42	2.11	-0.310	87.22%
q20	1.60	1.51	-0.089	94.46%
q21	1.24	1.10	-0.140	88.66%
q22	7.98	7.71	-0.275	96.56%
q23a	104.75	103.66	-1.089	98.96%
q23b	128.72	128.88	0.164	100.13%
q24a	111.28	109.30	-1.977	98.22%
q24b	111.05	107.88	-3.171	97.14%
q25	4.11	4.23	0.114	102.77%
q26	3.02	3.24	0.221	107.31%
q27	4.20	4.02	-0.185	95.60%
q28	33.64	33.79	0.153	100.46%
q29	11.19	11.05	-0.139	98.75%
q30	5.93	4.94	-0.993	83.26%
q31	7.94	7.50	-0.440	94.46%
q32	1.17	1.45	0.285	124.40%
q33	4.30	4.51	0.212	104.93%
q34	3.81	4.11	0.301	107.91%
q35	7.89	8.32	0.432	105.48%
q36	4.73	4.74	0.010	100.22%
q37	4.73	5.87	1.142	124.15%
q38	13.59	14.50	0.911	106.70%
q39a	3.44	3.27	-0.166	95.17%
q39b	3.06	2.83	-0.225	92.65%
q40	3.84	3.95	0.107	102.79%
q41	0.64	0.70	0.062	109.68%
q42	1.02	0.97	-0.056	94.56%
q43	4.84	4.83	-0.003	99.93%
q44	9.99	9.98	-0.011	99.89%
q45	3.23	3.28	0.053	101.63%
q46	3.82	3.74	-0.081	97.89%
q47	18.93	19.36	0.426	102.25%
q48	5.46	5.37	-0.092	98.32%
q49	9.03	9.27	0.243	102.69%
q50	21.58	21.79	0.207	100.96%
q51	9.97	9.73	-0.237	97.63%
q52	1.13	1.06	-0.071	93.69%
q53	2.44	2.44	-0.005	99.78%
q54	3.99	3.98	-0.012	99.70%
q55	1.10	1.10	-0.002	99.81%
q56	4.35	4.15	-0.193	95.55%
q57	11.07	10.86	-0.205	98.15%
q58	2.51	2.39	-0.118	95.31%
q59	11.12	11.44	0.323	102.90%
q60	4.10	4.10	-0.002	99.96%
q61	4.19	4.12	-0.063	98.50%
q62	4.89	4.59	-0.301	93.84%
q63	2.50	2.35	-0.153	93.90%
q64	60.58	62.31	1.723	102.84%
q65	17.29	17.60	0.314	101.82%
q66	4.06	5.74	1.672	141.17%
q67	408.72	385.75	-22.971	94.38%
q68	3.79	3.55	-0.242	93.62%
q69	5.56	5.32	-0.241	95.66%
q70	12.10	11.37	-0.724	94.01%
q71	2.41	2.30	-0.116	95.19%
q72	217.90	212.84	-5.057	97.68%
q73	2.46	2.31	-0.147	94.01%
q74	24.09	24.25	0.156	100.65%
q75	27.30	26.53	-0.769	97.18%
q76	11.60	11.79	0.186	101.60%
q77	2.44	2.32	-0.121	95.04%
q78	50.23	49.67	-0.558	98.89%
q79	4.08	3.94	-0.149	96.36%
q80	12.35	12.33	-0.023	99.82%
q81	4.88	4.98	0.103	102.12%
q82	7.29	7.79	0.500	106.85%
q83	1.69	1.74	0.052	103.06%
q84	2.86	2.82	-0.032	98.89%
q85	7.12	8.36	1.241	117.45%
q86	4.25	4.09	-0.161	96.20%
q87	15.68	14.21	-1.469	90.63%
q88	20.60	21.80	1.203	105.84%
q89	3.72	3.51	-0.205	94.50%
q90	3.07	3.58	0.505	116.45%
q91	2.21	2.43	0.221	110.01%
q92	1.36	1.31	-0.051	96.23%
q93	39.22	39.93	0.707	101.80%
q94	24.32	24.47	0.150	100.62%
q9	89.61	85.89	-3.720	95.85%
q5	2.80	2.65	-0.153	94.54%
q96	17.49	17.68	0.193	101.10%
q97	1.89	2.01	0.119	106.30%
q98	10.03	10.52	0.491	104.90%
q99	10.03	10.52	0.491	104.90%
total	2219.17	2183.21	-35.958	98.38%

GlutenPerfBot · 2024-08-29T19:05:13Z

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query	log/native_master_08_29_2024_time.csv	log/native_master_08_28_2024_3928dc2526_time.csv	difference	percentage
q1	39.06	40.45	1.391	103.56%
q2	30.61	30.31	-0.301	99.02%
q3	53.88	53.48	-0.396	99.26%
q4	41.59	41.72	0.130	100.31%
q5	104.66	102.35	-2.314	97.79%
q6	11.08	12.91	1.836	116.57%
q7	118.69	116.14	-2.552	97.85%
q8	115.51	117.06	1.556	101.35%
q9	168.74	168.17	-0.565	99.67%
q10	64.93	63.46	-1.470	97.74%
q11	27.73	26.85	-0.881	96.82%
q12	31.17	30.31	-0.860	97.24%
q13	52.16	52.28	0.119	100.23%
q14	27.25	25.88	-1.378	94.95%
q15	54.32	56.08	1.760	103.24%
q16	18.16	18.31	0.153	100.84%
q17	130.29	133.77	3.483	102.67%
q18	199.00	199.77	0.763	100.38%
q19	30.88	24.78	-6.099	80.25%
q20	43.14	42.75	-0.396	99.08%
q21	387.80	390.58	2.778	100.72%
q22	17.22	15.25	-1.977	88.52%
total	1767.88	1762.66	-5.220	99.70%

…lumnVector (apache#6962) Closes apache#6961

github-actions bot added the VELOX label Aug 21, 2024

zhztheplayer reviewed Aug 22, 2024

View reviewed changes

jinchengchenghh force-pushed the arrow branch from 38d6f3a to b439c14 Compare August 22, 2024 02:42

jinchengchenghh added 4 commits August 22, 2024 10:17

[GLUTEN-6961] [VL] [feat] ArrowWritableColumnVector support write to …

b439c14

…decimal

add test

8b8aa29

refactor

cd1f114

fix comments

001e259

zhztheplayer changed the title ~~[GLUTEN-6961] [VL] [feat] ArrowWritableColumnVector support write to decimal~~ [GLUTEN-6961][VL][feat] ArrowWritableColumnVector support write to decimal Aug 29, 2024

fixup

58cec3e

zhztheplayer approved these changes Aug 29, 2024

View reviewed changes

zhztheplayer changed the title ~~[GLUTEN-6961][VL][feat] ArrowWritableColumnVector support write to decimal~~ [GLUTEN-6961][VL][feat] Add decimal write support for ArrowWritableColumnVector Aug 29, 2024

zhztheplayer merged commit e7caab1 into apache:main Aug 29, 2024
44 checks passed

sharkdtu pushed a commit to sharkdtu/gluten that referenced this pull request Nov 11, 2024

[GLUTEN-6961][VL][feat] Add decimal write support for ArrowWritableCo…

9b6d857

…lumnVector (apache#6962) Closes apache#6961

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-6961][VL][feat] Add decimal write support for ArrowWritableColumnVector #6962

[GLUTEN-6961][VL][feat] Add decimal write support for ArrowWritableColumnVector #6962

jinchengchenghh commented Aug 21, 2024

github-actions bot commented Aug 21, 2024

zhztheplayer Aug 22, 2024 •

edited

Loading

jinchengchenghh Aug 22, 2024

zhztheplayer Aug 22, 2024 •

edited

Loading

jinchengchenghh Aug 22, 2024

jinchengchenghh Aug 22, 2024

zhztheplayer Aug 23, 2024 •

edited

Loading

jinchengchenghh Aug 23, 2024 •

edited

Loading

jinchengchenghh Aug 23, 2024

zhztheplayer Aug 22, 2024 •

edited

Loading

jinchengchenghh Aug 22, 2024

jinchengchenghh commented Aug 28, 2024

zhztheplayer left a comment

zhztheplayer commented Aug 29, 2024

GlutenPerfBot commented Aug 29, 2024

GlutenPerfBot commented Aug 29, 2024

[GLUTEN-6961][VL][feat] Add decimal write support for ArrowWritableColumnVector #6962

[GLUTEN-6961][VL][feat] Add decimal write support for ArrowWritableColumnVector #6962

Conversation

jinchengchenghh commented Aug 21, 2024

github-actions bot commented Aug 21, 2024

zhztheplayer Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

jinchengchenghh Aug 22, 2024

Choose a reason for hiding this comment

zhztheplayer Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

jinchengchenghh Aug 22, 2024

Choose a reason for hiding this comment

jinchengchenghh Aug 22, 2024

Choose a reason for hiding this comment

zhztheplayer Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

jinchengchenghh Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

jinchengchenghh Aug 23, 2024

Choose a reason for hiding this comment

zhztheplayer Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

jinchengchenghh Aug 22, 2024

Choose a reason for hiding this comment

jinchengchenghh commented Aug 28, 2024

zhztheplayer left a comment

Choose a reason for hiding this comment

zhztheplayer commented Aug 29, 2024

GlutenPerfBot commented Aug 29, 2024

GlutenPerfBot commented Aug 29, 2024

zhztheplayer Aug 22, 2024 •

edited

Loading

zhztheplayer Aug 22, 2024 •

edited

Loading

zhztheplayer Aug 23, 2024 •

edited

Loading

jinchengchenghh Aug 23, 2024 •

edited

Loading

zhztheplayer Aug 22, 2024 •

edited

Loading