Skip to content
This repository has been archived by the owner on Jun 2, 2021. It is now read-only.

specify encoding when updating the dataset #6

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

n0542344
Copy link

Dear Rami Krispin!

Thanks for your awesome coronavirus-package, it makes working with Covid-19-data in R very convenient!

Never the less I had an issue: I wasn't able to update the dataset on my machine (running a Debian Stable-based OS and R 3.5.2) because I was getting the following error:

invalid multibyte string at '<f0><8a><cb><fa>'

When looking into the update_dataset()-function (in the R/data_refresh.R-file) I realized that this will probably be due to the read.csv()-function. When running rio::import() on the same target (which uses data.table::fread() by default) this issue disappeared. I assumed that the error had to do with the encoding, which is why I specified the additional option fileEndocing = "UTF-8" inside the read.csv()-function, which solved the problem.

Since I assume that other people might also have that issue as well, I'm submitting this pull request with only this single line added.

If you have any questions don't hesitate to write me.

Best wishes, Alex (life science PhD-student from Vienna, Austria)


ps.: an excerpt of my utils::sessionInfo()-data looks as following:

R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: PureOS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

@RamiKrispin
Copy link
Member

Hi @n0542344 ,

Thanks for the PR!

There was a parsing issue, I was trying to add the encoding as suggested on this PR but it did not work. I think that the main reason for this parsing issue was related to fact that I was reading/writing the csv files in some cases I used the write_csv and read_csv functions from the readr package and others I used the read.csv function. After I changed all read and write to use the readr package it solved the issue. Still testing it...

On a side note, this repo is old version of the coronavirus and it is not active (I probably should remove it). The main repo is here:
https://github.com/RamiKrispin/coronavirus

@RamiKrispin
Copy link
Member

I would recommend in the meanwhile to install the master branch:

https://github.com/RamiKrispin/coronavirus

That seems to be working:

x <- coronavirus::refresh_coronavirus_jhu()
Parsed with column specification:
cols(
  date = col_date(format = ""),
  province = col_character(),
  country = col_character(),
  lat = col_double(),
  long = col_double(),
  type = col_character(),
  cases = col_double()
)
Parsed with column specification:
cols(
  location = col_character(),
  location_code = col_character(),
  location_code_type = col_character()
)
> max(x$date)
[1] "2021-05-26"

@n0542344
Copy link
Author

n0542344 commented May 28, 2021 via email

@RamiKrispin
Copy link
Member

Yes, planning to archive this repo, thx!

@RamiKrispin
Copy link
Member

I pushed the changes (read.csv -> read_csv) to CRAN, please let me know if you have any issues to refresh the data.

https://cran.r-project.org/web/packages/coronavirus/index.html

@n0542344
Copy link
Author

n0542344 commented May 30, 2021 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants