You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes I receive .csv files from external data providers that include no quotes (presumably because the data manager forgot to click the right button when exporting them from SQL), which makes it really hard to read in files with free text comments that can contain newlines. Does anyone have a good way to deal with those files, particularly when they are >1GB?
I agree it would be easier to fix the file using shell command first
Another option with sed and some regex matching:
sed -E -e :x -e '/^([^,]*,){2}[^,]*$/N; s/\n/ /; tx' test.txt
Also, data.table::fread() can pre-process input file with invoked system command so if the OS has sed installed we can do:
#write temporary file from the uploaded txt filefile<-"https://github.com/ucl-ihi/CodeClub/files/3481300/test.txt"tmp<- tempfile(fileext=".txt")
httr::GET(url=file, httr::write_disk(tmp))
library(data.table)
# sed commands: note that the EOL characters \r\\n may differ depending on the OScommand<-"sed -E -e :x -e '/^([^,]*,){2}[^,]*$/N; s/\r\\n/ /; tx'"
fread(cmd= paste(command, tmp))
#> A B C D#> 1: 1 a commment over multiple lines 1#> 2: 2 b only one lines 5#> 3: 3 c single 8#> 4: 4 d again multiple lines that are not quoted properly 9
Hi all,
Sometimes I receive .csv files from external data providers that include no quotes (presumably because the data manager forgot to click the right button when exporting them from SQL), which makes it really hard to read in files with free text comments that can contain newlines. Does anyone have a good way to deal with those files, particularly when they are >1GB?
An example:
test.txt
The text was updated successfully, but these errors were encountered: