-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CH] Insert wrong result for struct
field
#6588
Comments
Seems not related with #4317. I reverted the pr code and the issue still can be reproduced. |
@exmy insert overwrite tbl partition (day)
select id as a,
map('t1', 'a', 't2', 'b'),
struct('1', null) as c,
'2024-01-08' as day
from range(10) |
in spark3.3, it still can be reproduced. |
Can't be reproduced in spark 3.5 with this branch: #6601 |
Let's dig into it. |
It can be reproduced with below spark configs
0: jdbc:hive2://localhost:10000/> select * from tbl;
+----+----------------------+-------+-------------+
| a | b | c | day |
+----+----------------------+-------+-------------+
| 0 | {"t1":"a","t2":"b"} | NULL | 2024-01-08 |
+----+----------------------+-------+-------------+
1 row selected (0.279 seconds)
0: jdbc:hive2://localhost:10000/>
0: jdbc:hive2://localhost:10000/>
0: jdbc:hive2://localhost:10000/> set spark.gluten.enabled = false;
+-----------------------+--------+
| key | value |
+-----------------------+--------+
| spark.gluten.enabled | false |
+-----------------------+--------+
1 row selected (0.026 seconds)
0: jdbc:hive2://localhost:10000/>
0: jdbc:hive2://localhost:10000/> select * from tbl;
+----+----------------------+----------------------+-------------+
| a | b | c | day |
+----+----------------------+----------------------+-------------+
| 0 | {"t1":"a","t2":"b"} | {"d":null,"e":null} | 2024-01-08 |
+----+----------------------+----------------------+-------------+
1 row selected (0.541 seconds)
file contents: $ orc-contents ./part-00000-e89c5aef-d240-4d78-80a1-d04ebaf5868d.c000.lz4.orc
{"a": 0, "b": [{"key": "t1", "value": "a"}, {"key": "t2", "value": "b"}], "c": {"1": "1", "2": null}} |
线索1:native和非native insert时,生成的orc文件schema有diff native insert
非native insert
|
线索2 native insert with
|
原因:
old_header对应FakeRow输出的CH Block的schema。其中字段c的类型是
|
Backend
CH (ClickHouse)
Bug description
expected result:
gluten result:
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response
The text was updated successfully, but these errors were encountered: