-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multi line for CSV #1408
Comments
This one is interesting. Need to understand how they implement this. We tested Spark CSV reader in the past and found the performance is quite low compare with our version, may worth to bench mark again. |
@yw-yang Before we adding support for this, actually you can use Basically you can create a lib in your project and define a |
Just FYI. They use a non-split stream reader (basically a binary file handler rather than the standard hadoop text file reader) to stream the data. This means the entire CSV has to be read using a single task (hence why this option is off by default). |
In some cases, column value may contain carriage return within double quotes, e.g.
Spark support
multiLine
option since 2.2(see https://issues.apache.org/jira/browse/SPARK-19610), shall SMV also add it tocsvAttributes
?The text was updated successfully, but these errors were encountered: