Functions to import reported daily sub-national new cases and deaths from LMIC #5

ffinger · 2020-04-09T07:39:22Z

Description

A function for each country that accesses public data sources and makes the data accessible in R.
Good examples in this package: https://github.com/epiforecasts/NCoVUtils for a number of countries.

Functions are existent for most European and Asian countries and the US. We are looking for data and functions for LMIC at the moment, especially African countries.

I suggest we use this issue to keep track of

Needs for data and R functions for specific countries
Available data sources for specific countries
Functions that import the data sources identified in 2.

See below for google sheet tracking those.

Output

The output format of each function should be a long data frame containing the following columns:

country
admin_subdivision_level_1
admin_subdivision_level_2 (if available)
(more levels if available)
date
cases
deaths

where cases and deaths stand for the newly reported cases and deaths on that day.

Functions can either be added to https://github.com/epiforecasts/NCoVUtils via pull request, or we can start our own package that wraps NCoVUtils and other solutions for the already implemented countries.

Countries already covered:

https://github.com/epiforecasts/NCoVUtils covers the following countries so far:

Belgium
Canada
France
Germany
Italy
Spain
United Kingdom
United States
Japan
Korea
Afghanistan

Countries to be done

Spreadsheet to track requested countries, data sources and implementations:
https://docs.google.com/spreadsheets/d/1uvg07BAmwKqLqhKvkejhkX7uvXiGCre4sz11Au3pz9Q/edit?usp=sharing

my own priority list:

Burkina Faso
Irak
Democratic Republic of the Congo
Syria

Links

A few places where data sources are indexed:

https://data.humdata.org/event/covid-19
https://coronavirustechhandbook.com/home
https://www.europeandataportal.eu/data/datasets?locale=en&categories=heal&page=1&query=covid

ffinger · 2020-04-09T07:51:33Z

@patrickbarks @PaulC91 @ntncmch @scottyaz @sbfnk @epiforcasts @seabbs @hamishgibbs @jhellewell14

scottyaz · 2020-04-09T08:52:27Z

Would be good to start a google spreadsheet (if it doesn't exist) with sources for each country. For example http://covid19.health.gov.mw is a good source for Malawi.

seabbs · 2020-04-09T08:57:25Z

We are keen to have contributions to our package but also happy for this to be a separate project if that makes sense. We've been talking about how/if we want to support it as a more widely known data resource and that is seeming to make more and more sense.

ffinger · 2020-04-09T09:56:56Z

Google spreadsheet to track requests for countries, data sources and implementations:
https://docs.google.com/spreadsheets/d/1uvg07BAmwKqLqhKvkejhkX7uvXiGCre4sz11Au3pz9Q/edit?usp=sharing

ffinger · 2020-04-09T10:00:55Z

@seabbs, happy to contribute to NCoVUtils

xt-21 · 2020-04-10T19:39:45Z

Would like to work on Burkina Faso

ColinFay · 2020-04-12T12:29:13Z

Hey,

Can you pitch on the process of contributing?

Do you want us to PR {NCovUtils}?

Also, there is: https://www.worldometers.info/coronavirus/

Here's a fun to get today and yesterday df:

get_worldmeter_df <- function(){
  url <- xml2::read_html(
    "https://www.worldometers.info/coronavirus/"
  )
  tbls <- rvest::html_table(url)
  tbls[[1]] <- tbls[[1]][8:nrow(tbls[[1]]),]
  tbls[[2]] <- tbls[[2]][8:nrow(tbls[[2]]),]
  list(
    today = tbls[[1]], 
    yesterday = tbls[[2]]
  )
}
get_worldmeter_df()

Has Burkina Faso, Irak, Congo and Syria

get_worldmeter_df()$today[
  tod$`Country,Other` %in% c("Burkina Faso", "Irak", "Congo", "Syria"),
]
    Country,Other TotalCases NewCases TotalDeaths NewDeaths
99   Burkina Faso        484                   27          
147         Congo         60                    5          
166         Syria         25                    2          
    TotalRecovered ActiveCases Serious,Critical
99             155         302                 
147              5          50                 
166              5          18                 
    Tot Cases/1M pop Deaths/1M pop TotalTests Tests/1M pop
99                23             1                        
147               11           0.9                        
166                1           0.1                        
    Continent
99     Africa
147    Africa
166      Asia

ffinger · 2020-04-12T12:37:08Z

Hi @ColinFay,
yes, the best is to PR NCovUtils.

I haven't seen any sub-national data (by region, province or similar) on wordlometers, am I missing something?

Think it would still be a good additional resource to the already existing functions to get national data from ECDC, WHO, JHU or similar, especially since there seems to be data on testing.

ColinFay · 2020-04-12T12:38:25Z

@ffinger not that I know of

ColinFay · 2020-04-12T12:38:33Z

Possible other source for Burkina Faso :
https://www.humanitarianresponse.info/en/op%C3%A9rations/burkina-faso/documents/table/themes/covid-19

Need to scrape the pdf(s)

ffinger · 2020-04-12T12:40:16Z

Thanks for this!
Anyone having time to implement scraping?

ffinger · 2020-04-12T12:44:50Z

There is a figure here too, sources are probably the previously linked sitreps:
https://fr.wikipedia.org/wiki/Pand%C3%A9mie_de_Covid-19_au_Burkina_Faso

Probably possible to scrape since the data seems to be in the code of the figure (click on modify code to see).

ColinFay · 2020-04-12T12:51:19Z

Here's the code to download all the pdfs:

dir.create("burkina_covid")
for (
  i in c(
    "https://www.humanitarianresponse.info/en/op%C3%A9rations/burkina-faso/documents/table/themes/covid-19", 
    "https://www.humanitarianresponse.info/en/op%C3%A9rations/burkina-faso/documents/table/themes/covid-19?page=1", 
    "https://www.humanitarianresponse.info/en/op%C3%A9rations/burkina-faso/documents/table/themes/covid-19?page=2", 
    "https://www.humanitarianresponse.info/en/op%C3%A9rations/burkina-faso/documents/table/themes/covid-19?page=3"
  )
){
  url <- xml2::read_html(
    i
  )
  but <- rvest::html_nodes(url, ".dropdown-menu a")
  lapply(
    rvest::html_attr(but, "href"), 
    function(x){
      download.file(
        x, 
        file.path(
          "burkina_covid", 
          basename(x)
        )
      )
    }
  )
}

> fs::dir_tree("burkina_covid/")
burkina_covid/
├── covidresponseplanremarks-french.docx
├── ghrp-covid19-en.pdf
├── ghrp-covid19-fr.pdf
├── integration_du_covid-19_dans_la_reponse_humanitaire.pdf
├── plan_de_riposte_covid19-revise_def.pdf
├── reach_bfa_suivi_situation_humanitaire_resultats_pertinents_covid19_region_centre_nord_fevrier_2020-1.pdf
├── reach_bfa_suivi_situation_humanitaire_resultats_pertinents_covid19_region_centre_nord_fevrier_2020-1_1.pdf
├── reach_bfa_suivi_situation_humanitaire_resultats_pertinents_covid19_region_sahel_fevrier_2020.pdf
├── sitrep_n27_du_24_03_20.pdf
├── sitrep_n_29_0.pdf
├── sitrep_n_32_covid-19_du_29_mars_2020_0.pdf
├── sitrep_n_33.pdf
├── sitrep_ndeg17_du_14_03_20.pdf
├── sitrep_ndeg21.pdf
├── sitrep_ndeg24_du_21_03_20.pdf
├── sitrep_ndeg25.pdf
├── sitrep_ndeg28.pdf
├── sitrep_ndeg35_0.pdf
├── sitrep_ndeg_20_du_17_03_20.pdf
├── sitrep_ndeg_22_du_19_03_20.pdf
├── sitrep_ndeg_26_du_23_03_20.pdf
├── sitrep_ndeg_31.pdf
├── sitrep_ndeg_34.pdf
├── sitrep_ndeg_36.pdf
├── sitrep_ndeg_37.pdf
├── sitrep_ndeg_38_covid_bfa_au_04_04_2020.pdf
├── sitrep_ndeg_39_1.pdf
├── sitrep_ndeg_40_0.pdf
├── sitrep_ndeg_41_au_7_avril_2020_1.pdf
├── sitrep_ndeg_42_covid-19_burkina_faso.pdf
├── sitrep_ndeg_43.pdf
└── sitrep_ndeg_44.pdf

ColinFay · 2020-04-12T13:10:05Z

Here's a piece of code to extract data from the latest pdf:

library(tabulizer)
res <- tabulizer::extract_text("burkina_covid/sitrep_ndeg_44.pdf")
res <- strsplit(res, "\n")[[1]]
num_extr <- function(
  res, txt
){
  gsub(
    "[^:]*: ([0-9]*).*", 
    "\\1", 
    grep(txt, res, value = TRUE)
  )
}

cont <- c(
  "Cumul personnes contacts listées",
  "Contacts confirmés COVID-19 depuis le début", 
  "Nbre de contacts sortis de suivi ce jours", 
  "Cumul de contacts sortis après 14 jours de suivis", 
  "Nombre de contacts à suivre", 
  "Nombre de contacts vus", 
  "Nombre de contacts non vus", 
  "Nombre de contacts devenus suspects", 
  "Nombre de nouveaux contacts"
)

x <- sapply(
  cont, function(x){
    num_extr(res, x)
  }
) 

tibble::rownames_to_column(
  as.data.frame(x), 
  "type"
)
                                               type    x
1                  Cumul personnes contacts listées 2409
2       Contacts confirmés COVID-19 depuis le début  272
3         Nbre de contacts sortis de suivi ce jours   31
4 Cumul de contacts sortis après 14 jours de suivis 1076
5                       Nombre de contacts à suivre 1061
6                            Nombre de contacts vus    1
7                        Nombre de contacts non vus   19
8               Nombre de contacts devenus suspects   10
9                       Nombre de nouveaux contacts   99

I'm french so these seems to be the interesting part, but as I'm no expert in the field that would be nifty to have s.o with domain knowledge pointing me to the interesting part of the pdf.

PaulC91 · 2020-04-12T13:42:40Z

nouveaux cas confirmés et décès par district seraient super. but it doesn't there is any pattern in the way this information is given in the pdf (unlike the suivi des contacts section above), so I'm guessing it would be difficult to scrape consistently.

ColinFay · 2020-04-12T13:44:44Z

here's a attempt at a package to download and scrape data: https://github.com/ColinFay/covidbf

Let me know if you want me to work more on this.

ffinger · 2020-04-12T15:01:34Z

@ColinFay thanks a lot.
As mentioned by @PaulC91 the information you are scraping is the reports about contact tracing. The new cases per region or per district are hidden in the text and not consistently reported it seems. There is also the map at the beginning that gives new cases by district, but very hard to scrape I believe...

ColinFay · 2020-04-12T18:42:42Z

Just to check, have you tried contacting the people listed at the bottom of the pdf? They might be willing to share the data

ColinFay · 2020-04-12T19:12:42Z

Oh and, what's LMIC?

xt-21 · 2020-04-12T19:17:11Z

Low to middle income country

…

On Sun, Apr 12, 2020 at 9:12 PM Colin Fay ***@***.***> wrote: *External Email* Oh and, what's LMIC? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKFPHCKERQFDUDSC2ZPCTQLRMIHDNANCNFSM4MEQPAQA> .

ffinger · 2020-04-12T20:15:18Z

@ColinFay yes, we are in contact with authorities.

ffinger · 2020-04-28T16:31:36Z

I added some new countries and potential data sources to the spreadsheet.

See here for details:
epiforecasts/NCoVUtils#72 (comment)

ffinger added high_priority Urgent for COVID19 analytics medium_complexity Can be completed by 1 person in a few (<5) days. enhancement Add new features to an existing package new_package Create a new R package labels Apr 9, 2020

ffinger removed the new_package Create a new R package label Apr 9, 2020

ffinger mentioned this issue Apr 9, 2020

Data for additional countries. epiforecasts/NCoVUtils#72

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Functions to import reported daily sub-national new cases and deaths from LMIC #5

Functions to import reported daily sub-national new cases and deaths from LMIC #5

ffinger commented Apr 9, 2020 •

edited

Loading

ffinger commented Apr 9, 2020

scottyaz commented Apr 9, 2020

seabbs commented Apr 9, 2020

ffinger commented Apr 9, 2020

ffinger commented Apr 9, 2020

xt-21 commented Apr 10, 2020

ColinFay commented Apr 12, 2020 •

edited

Loading

ffinger commented Apr 12, 2020 •

edited

Loading

ColinFay commented Apr 12, 2020

ColinFay commented Apr 12, 2020 •

edited

Loading

ffinger commented Apr 12, 2020

ffinger commented Apr 12, 2020

ColinFay commented Apr 12, 2020

ColinFay commented Apr 12, 2020

PaulC91 commented Apr 12, 2020

ColinFay commented Apr 12, 2020

ffinger commented Apr 12, 2020

ColinFay commented Apr 12, 2020

ColinFay commented Apr 12, 2020

xt-21 commented Apr 12, 2020 via email

ffinger commented Apr 12, 2020

ffinger commented Apr 28, 2020

Functions to import reported daily sub-national new cases and deaths from LMIC #5

Functions to import reported daily sub-national new cases and deaths from LMIC #5

Comments

ffinger commented Apr 9, 2020 • edited Loading

Description

Output

Countries already covered:

Countries to be done

Links

ffinger commented Apr 9, 2020

scottyaz commented Apr 9, 2020

seabbs commented Apr 9, 2020

ffinger commented Apr 9, 2020

ffinger commented Apr 9, 2020

xt-21 commented Apr 10, 2020

ColinFay commented Apr 12, 2020 • edited Loading

ffinger commented Apr 12, 2020 • edited Loading

ColinFay commented Apr 12, 2020

ColinFay commented Apr 12, 2020 • edited Loading

ffinger commented Apr 12, 2020

ffinger commented Apr 12, 2020

ColinFay commented Apr 12, 2020

ColinFay commented Apr 12, 2020

PaulC91 commented Apr 12, 2020

ColinFay commented Apr 12, 2020

ffinger commented Apr 12, 2020

ColinFay commented Apr 12, 2020

ColinFay commented Apr 12, 2020

xt-21 commented Apr 12, 2020 via email

ffinger commented Apr 12, 2020

ffinger commented Apr 28, 2020

ffinger commented Apr 9, 2020 •

edited

Loading

ColinFay commented Apr 12, 2020 •

edited

Loading

ffinger commented Apr 12, 2020 •

edited

Loading

ColinFay commented Apr 12, 2020 •

edited

Loading