-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange issue with strange characters on linux #86
Comments
Only happens on Linux. Could also be a bug with MSFT's ODBC Driver, but I guess MSFT does not do bugs 😉 |
This does not happen on Windows because on windows So my first piece of advice would be to check whether your System local is UTF-8. Other narrow ASCII encodings are currently not supported by |
Ok, I'll try. For completness, here's the link to my failing gh action: https://github.com/bmsuisse/odbc2deltalake/actions/runs/8538422686 |
Looking at this it is more likely that somehow binary and character size is confused. I just do not know there. Yet, likely the system local is innocent. Would you kindly enable debug logging and execute again. In the beginning it should log the database column names and their type, even before conversion to arrow. I would be interested in seeing that. |
You can log to standard error using: https://arrow-odbc.readthedocs.io/en/latest/arrow_odbc.html#arrow_odbc.log_to_stderr |
Here we go:
|
I could reproduce the Bug.
Actually MSFT has one of the more solid ODBC drivers I would say. I also admire many of the things they achieved in terms of engineering. Yet this time, I think it is on them. It is not a configuration issue of the client local. The umlaut Windows version of of Best, Markus |
As a workaround I'll give it a try to supply larger buffers than requested by the drivers ... |
To be fair to Microsoft, the function in question |
Hm, interesting. Would you think I can reproduce this behavior if I write it in C#/.Net Core using MSFT-only libs? |
Maybe not, depends if ADO relies on ODBC. Also relies on how wasteful authors of these libs are then using memory. I could avoid a lot of trouble if I would just allocate 4kb of buffer for each column name you wanna now. In that case you the name would need to be very long and contain special characters. A minimal reproducing example in plain C, maybe based on an example for |
This formulation alone in the documents is problematic |
It confuses binary buffer length and character length. Very likely related to the bug in question. |
Definitly this. If you want to see it fail under windows too, you can use a character in the column name which is 4 Bytes in UTF-16 instead of two. E.g. |
Sadly allocating larger buffers is not a suitable workaround. The drivers only copy the bytes in the length of character data in the application provided buffers. The "good" news is that this makes the bug more relevant to fix. Sorry, I cannot help here. I really tried. |
Thank you for all your efforts! I'll try creating a repo in C# and getting this to the correct guys at Redmond - it's an interesting bug, after all 🙂 |
Probably not even needed, just stating that |
Remark: Using |
with this code:
I get this output:
Please note the name of the
[time_stämp]
column in the schemaThe text was updated successfully, but these errors were encountered: