-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functions to import reported daily sub-national new cases and deaths from LMIC #5
Comments
@patrickbarks @PaulC91 @ntncmch @scottyaz @sbfnk @epiforcasts @seabbs @hamishgibbs @jhellewell14 |
Would be good to start a google spreadsheet (if it doesn't exist) with sources for each country. For example http://covid19.health.gov.mw is a good source for Malawi. |
We are keen to have contributions to our package but also happy for this to be a separate project if that makes sense. We've been talking about how/if we want to support it as a more widely known data resource and that is seeming to make more and more sense. |
Google spreadsheet to track requests for countries, data sources and implementations: |
@seabbs, happy to contribute to NCoVUtils |
Would like to work on Burkina Faso |
Hey, Can you pitch on the process of contributing? Do you want us to PR Also, there is: https://www.worldometers.info/coronavirus/ Here's a fun to get today and yesterday df: get_worldmeter_df <- function(){
url <- xml2::read_html(
"https://www.worldometers.info/coronavirus/"
)
tbls <- rvest::html_table(url)
tbls[[1]] <- tbls[[1]][8:nrow(tbls[[1]]),]
tbls[[2]] <- tbls[[2]][8:nrow(tbls[[2]]),]
list(
today = tbls[[1]],
yesterday = tbls[[2]]
)
}
get_worldmeter_df() Has Burkina Faso, Irak, Congo and Syria get_worldmeter_df()$today[
tod$`Country,Other` %in% c("Burkina Faso", "Irak", "Congo", "Syria"),
]
Country,Other TotalCases NewCases TotalDeaths NewDeaths
99 Burkina Faso 484 27
147 Congo 60 5
166 Syria 25 2
TotalRecovered ActiveCases Serious,Critical
99 155 302
147 5 50
166 5 18
Tot Cases/1M pop Deaths/1M pop TotalTests Tests/1M pop
99 23 1
147 11 0.9
166 1 0.1
Continent
99 Africa
147 Africa
166 Asia |
Hi @ColinFay, I haven't seen any sub-national data (by region, province or similar) on wordlometers, am I missing something? Think it would still be a good additional resource to the already existing functions to get national data from ECDC, WHO, JHU or similar, especially since there seems to be data on testing. |
@ffinger not that I know of |
Possible other source for Burkina Faso : Need to scrape the pdf(s) |
Thanks for this! |
There is a figure here too, sources are probably the previously linked sitreps: Probably possible to scrape since the data seems to be in the code of the figure (click on |
Here's the code to download all the pdfs: dir.create("burkina_covid")
for (
i in c(
"https://www.humanitarianresponse.info/en/op%C3%A9rations/burkina-faso/documents/table/themes/covid-19",
"https://www.humanitarianresponse.info/en/op%C3%A9rations/burkina-faso/documents/table/themes/covid-19?page=1",
"https://www.humanitarianresponse.info/en/op%C3%A9rations/burkina-faso/documents/table/themes/covid-19?page=2",
"https://www.humanitarianresponse.info/en/op%C3%A9rations/burkina-faso/documents/table/themes/covid-19?page=3"
)
){
url <- xml2::read_html(
i
)
but <- rvest::html_nodes(url, ".dropdown-menu a")
lapply(
rvest::html_attr(but, "href"),
function(x){
download.file(
x,
file.path(
"burkina_covid",
basename(x)
)
)
}
)
}
> fs::dir_tree("burkina_covid/")
burkina_covid/
├── covidresponseplanremarks-french.docx
├── ghrp-covid19-en.pdf
├── ghrp-covid19-fr.pdf
├── integration_du_covid-19_dans_la_reponse_humanitaire.pdf
├── plan_de_riposte_covid19-revise_def.pdf
├── reach_bfa_suivi_situation_humanitaire_resultats_pertinents_covid19_region_centre_nord_fevrier_2020-1.pdf
├── reach_bfa_suivi_situation_humanitaire_resultats_pertinents_covid19_region_centre_nord_fevrier_2020-1_1.pdf
├── reach_bfa_suivi_situation_humanitaire_resultats_pertinents_covid19_region_sahel_fevrier_2020.pdf
├── sitrep_n27_du_24_03_20.pdf
├── sitrep_n_29_0.pdf
├── sitrep_n_32_covid-19_du_29_mars_2020_0.pdf
├── sitrep_n_33.pdf
├── sitrep_ndeg17_du_14_03_20.pdf
├── sitrep_ndeg21.pdf
├── sitrep_ndeg24_du_21_03_20.pdf
├── sitrep_ndeg25.pdf
├── sitrep_ndeg28.pdf
├── sitrep_ndeg35_0.pdf
├── sitrep_ndeg_20_du_17_03_20.pdf
├── sitrep_ndeg_22_du_19_03_20.pdf
├── sitrep_ndeg_26_du_23_03_20.pdf
├── sitrep_ndeg_31.pdf
├── sitrep_ndeg_34.pdf
├── sitrep_ndeg_36.pdf
├── sitrep_ndeg_37.pdf
├── sitrep_ndeg_38_covid_bfa_au_04_04_2020.pdf
├── sitrep_ndeg_39_1.pdf
├── sitrep_ndeg_40_0.pdf
├── sitrep_ndeg_41_au_7_avril_2020_1.pdf
├── sitrep_ndeg_42_covid-19_burkina_faso.pdf
├── sitrep_ndeg_43.pdf
└── sitrep_ndeg_44.pdf
|
Here's a piece of code to extract data from the latest pdf: library(tabulizer)
res <- tabulizer::extract_text("burkina_covid/sitrep_ndeg_44.pdf")
res <- strsplit(res, "\n")[[1]]
num_extr <- function(
res, txt
){
gsub(
"[^:]*: ([0-9]*).*",
"\\1",
grep(txt, res, value = TRUE)
)
}
cont <- c(
"Cumul personnes contacts listées",
"Contacts confirmés COVID-19 depuis le début",
"Nbre de contacts sortis de suivi ce jours",
"Cumul de contacts sortis après 14 jours de suivis",
"Nombre de contacts à suivre",
"Nombre de contacts vus",
"Nombre de contacts non vus",
"Nombre de contacts devenus suspects",
"Nombre de nouveaux contacts"
)
x <- sapply(
cont, function(x){
num_extr(res, x)
}
)
tibble::rownames_to_column(
as.data.frame(x),
"type"
)
type x
1 Cumul personnes contacts listées 2409
2 Contacts confirmés COVID-19 depuis le début 272
3 Nbre de contacts sortis de suivi ce jours 31
4 Cumul de contacts sortis après 14 jours de suivis 1076
5 Nombre de contacts à suivre 1061
6 Nombre de contacts vus 1
7 Nombre de contacts non vus 19
8 Nombre de contacts devenus suspects 10
9 Nombre de nouveaux contacts 99 I'm french so these seems to be the interesting part, but as I'm no expert in the field that would be nifty to have s.o with domain knowledge pointing me to the interesting part of the pdf. |
nouveaux cas confirmés et décès par district seraient super. but it doesn't there is any pattern in the way this information is given in the pdf (unlike the suivi des contacts section above), so I'm guessing it would be difficult to scrape consistently. |
here's a attempt at a package to download and scrape data: https://github.com/ColinFay/covidbf Let me know if you want me to work more on this. |
@ColinFay thanks a lot. |
Just to check, have you tried contacting the people listed at the bottom of the pdf? They might be willing to share the data |
Oh and, what's LMIC? |
Low to middle income country
…On Sun, Apr 12, 2020 at 9:12 PM Colin Fay ***@***.***> wrote:
*External Email*
Oh and, what's LMIC?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKFPHCKERQFDUDSC2ZPCTQLRMIHDNANCNFSM4MEQPAQA>
.
|
@ColinFay yes, we are in contact with authorities. |
I added some new countries and potential data sources to the spreadsheet. See here for details: |
Description
A function for each country that accesses public data sources and makes the data accessible in R.
Good examples in this package: https://github.com/epiforecasts/NCoVUtils for a number of countries.
Functions are existent for most European and Asian countries and the US. We are looking for data and functions for LMIC at the moment, especially African countries.
I suggest we use this issue to keep track of
See below for google sheet tracking those.
Output
The output format of each function should be a long data frame containing the following columns:
where cases and deaths stand for the newly reported cases and deaths on that day.
Functions can either be added to https://github.com/epiforecasts/NCoVUtils via pull request, or we can start our own package that wraps NCoVUtils and other solutions for the already implemented countries.
Countries already covered:
https://github.com/epiforecasts/NCoVUtils covers the following countries so far:
Countries to be done
Spreadsheet to track requested countries, data sources and implementations:
https://docs.google.com/spreadsheets/d/1uvg07BAmwKqLqhKvkejhkX7uvXiGCre4sz11Au3pz9Q/edit?usp=sharing
my own priority list:
Links
A few places where data sources are indexed:
https://data.humdata.org/event/covid-19
https://coronavirustechhandbook.com/home
https://www.europeandataportal.eu/data/datasets?locale=en&categories=heal&page=1&query=covid
The text was updated successfully, but these errors were encountered: