You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's an argument for the library to be more fussy, but a user-facing application should just try to do the least frustrating thing. We might also experiment with approaches here, and then port them back to OpenDP.
The text was updated successfully, but these errors were encountered:
Use "encoding='utf8-lossy'", so we'll see � for any bad characters.
(Since we're not doing much string processing, and they are mostly used for grouping, not a bad option.)
Prompt the user for file encoding... but it's unlikely that they know.
DP Creator sniffs file, and includes encoding in generated code.
generated code has a helper function that loads lazyframe from CSV with given encoding.
OpenDP has a helper function that loads lazyframe from CSV with given encoding.
DP Creator includes a sniffer function in generated code.
OpenDP has a has a sniffer function that DP Creator will use.
Any approach that tries to reencode the file as UTF-8 temporarily will be a little awkward because we'd want to write to a temporary file, but scan_csv is lazy: How and when do we get rid of the re-encoded file? (... which may contain private data!)
See
Polars has a problem with non-utf8 CSVs.
There's an argument for the library to be more fussy, but a user-facing application should just try to do the least frustrating thing. We might also experiment with approaches here, and then port them back to OpenDP.
The text was updated successfully, but these errors were encountered: