specify encoding when updating the dataset #6

n0542344 · 2021-05-26T09:17:34Z

Dear Rami Krispin!

Thanks for your awesome coronavirus-package, it makes working with Covid-19-data in R very convenient!

Never the less I had an issue: I wasn't able to update the dataset on my machine (running a Debian Stable-based OS and R 3.5.2) because I was getting the following error:

invalid multibyte string at '<f0><8a><cb><fa>'

When looking into the update_dataset()-function (in the R/data_refresh.R-file) I realized that this will probably be due to the read.csv()-function. When running rio::import() on the same target (which uses data.table::fread() by default) this issue disappeared. I assumed that the error had to do with the encoding, which is why I specified the additional option fileEndocing = "UTF-8" inside the read.csv()-function, which solved the problem.

Since I assume that other people might also have that issue as well, I'm submitting this pull request with only this single line added.

If you have any questions don't hesitate to write me.

Best wishes, Alex (life science PhD-student from Vienna, Austria)

ps.: an excerpt of my utils::sessionInfo()-data looks as following:

R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: PureOS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

RamiKrispin · 2021-05-28T01:04:49Z

Hi @n0542344 ,

Thanks for the PR!

There was a parsing issue, I was trying to add the encoding as suggested on this PR but it did not work. I think that the main reason for this parsing issue was related to fact that I was reading/writing the csv files in some cases I used the write_csv and read_csv functions from the readr package and others I used the read.csv function. After I changed all read and write to use the readr package it solved the issue. Still testing it...

On a side note, this repo is old version of the coronavirus and it is not active (I probably should remove it). The main repo is here:
https://github.com/RamiKrispin/coronavirus

RamiKrispin · 2021-05-28T01:11:16Z

I would recommend in the meanwhile to install the master branch:

https://github.com/RamiKrispin/coronavirus

That seems to be working:

x <- coronavirus::refresh_coronavirus_jhu()
Parsed with column specification:
cols(
  date = col_date(format = ""),
  province = col_character(),
  country = col_character(),
  lat = col_double(),
  long = col_double(),
  type = col_character(),
  cases = col_double()
)
Parsed with column specification:
cols(
  location = col_character(),
  location_code = col_character(),
  location_code_type = col_character()
)
> max(x$date)
[1] "2021-05-26"

n0542344 · 2021-05-28T07:55:14Z

Hi Rami!

On Fri, 2021-05-28 (w21) 01:05:05, Rami Krispin wrote: Thanks for the PR!

I thank you for the awesome package!

There was a parsing issue, I was trying to add the encoding as suggested on this PR but it did not work. I think that the main reason for this parsing issue was related to fact that I was reading/writing the csv files in some cases I used the `write_csv` and `read_csv` functions from the **readr** package and others I used the `read.csv` function. After I changed all read and write to use the **readr** package it solved the issue. Still testing it...

That sounds logical - the function should only be necessary when using the `base::read.table()`-function. I wasn't sure if you wanted to include `readr` as a dependency (since it also works with `base::read.table()`, that's why I chose to just change the used function. But I do have to agree that it's by far more convenient using the `readr`-package.

On a side note, this repo is old version of the **coronavirus** and it is not active (I probably should remove it). The main repo is here: https://github.com/RamiKrispin/coronavirus

Ah, ok, thanks for the heads-up! I was already confused since on the other package there were no changes for over a year... Maybe could you just change the readme to point to the newer version? And it would also be great to archive the repository (see also https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/archiving-a-github-repository/archiving-repositories) so that people get pointed automatically in the right direction. Thanks again, best wishes, Alex

RamiKrispin · 2021-05-28T13:23:42Z

Yes, planning to archive this repo, thx!

RamiKrispin · 2021-05-30T03:05:47Z

I pushed the changes (read.csv -> read_csv) to CRAN, please let me know if you have any issues to refresh the data.

https://cran.r-project.org/web/packages/coronavirus/index.html

n0542344 · 2021-05-30T09:24:01Z

Dear Rami!

On Sat, 2021-05-29 (w21) 20:05:59, Rami Krispin wrote: I pushed the changes (`read.csv` -> `read_csv`) to CRAN, please let me know if you have any issues to refresh the data. https://cran.r-project.org/web/packages/coronavirus/index.html

Awesome - now it works flawlessly! Thanks, best wishes, Alex

specify encoding when updating the dataset

a22fa87

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

specify encoding when updating the dataset #6

specify encoding when updating the dataset #6

n0542344 commented May 26, 2021

RamiKrispin commented May 28, 2021

RamiKrispin commented May 28, 2021

n0542344 commented May 28, 2021 via email

RamiKrispin commented May 28, 2021

RamiKrispin commented May 30, 2021

n0542344 commented May 30, 2021 via email

specify encoding when updating the dataset #6

Are you sure you want to change the base?

specify encoding when updating the dataset #6

Conversation

n0542344 commented May 26, 2021

RamiKrispin commented May 28, 2021

RamiKrispin commented May 28, 2021

n0542344 commented May 28, 2021 via email

RamiKrispin commented May 28, 2021

RamiKrispin commented May 30, 2021

n0542344 commented May 30, 2021 via email