-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade from 40 to 43 causes utf8 timestamp queries to fail #13625
Comments
@TheBuilderJR Cannot reproduce this problem on version 43.0.0. Maybe something is missing here. let df = ctx.sql("create table logs(timestamp varchar);").await?;
let df = ctx
.sql(r#"SELECT * FROM logs WHERE timestamp IS NOT NULL ORDER BY timestamp DESC LIMIT 3"#)
.await?;
println!("{:?}",df.schema());
df.show().await?; got results:
|
i wonder which query operation could have produced this error. In any case, here is a proposal to improve the error message: #13628 |
Thanks @holicc and @findepi I think I looked at the wrong query in the logs. You can reproduce with something like this
which produces
|
Does the problem go away if you turn off this config setting: https://datafusion.apache.org/user-guide/configs.html datafusion.execution.parquet.schema_force_view_types We are still working through some additional needed support: |
@alamb yep that fixes it
I think one small frustration I've had with datafusion is the amount of backwards breaking changes. Is it fair to say that datafusion isn't ready for production yet? Are there any active plans to add a more comprehensive test suite so users can feel confident more confident with the updates? Or perhaps are there any config settings that I can opt into that trades off stability for performance? |
My guess is that you are being impacted by the issue being worked on in #13404 - Edit: no, in fact I am unsure as to whether I've seen this issue before - string -> timestamp implicit casting failing.
As far as tests there is this: #13470 however I don't believe that will cover a large portion of what is breaking between releases for people, at least not initially. As far as being production ready I suppose that depends on your definition of production ready. It's being used in production by quite a few companies for some time now but upgrades do seem to have been a concern - #13525 |
So, I just ran the following against main as of a few days ago:
|
@TheBuilderJR can you please check whether the reproducer query that did work in DF 40 doesn't work in current main? |
@findepi It was broken on a dec2 fork I maintain (https://github.com/TheBuilderJR/datafusion/tree/main-dec-2-fork). Is there a PR that you have that landed past that date? |
I don't have any particular PR in mind, but if you have reproducer that still doesn't work on main, then it is something we can help with. |
@findepi if you can't repro on main anymore but you can from before dec2 then I'm happy to close it out. I'm not sure if the repro above is exactly the same since I'm not sure if varchar gets converted into utf8 type. I'm unblocked now thanks to @alamb's suggestion so it's nbd either way. If you insist on closing this out I'm ok with that as well :) |
Describe the bug
I had a column timestamp that is UTF8
Previously if i ran a query like
datafusion would return with no errors, but after upgrading to 43 I get
To Reproduce
Create a similar looking schema and run a query against it
Expected behavior
No errors
Additional context
No response
The text was updated successfully, but these errors were encountered: