Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package: bed occupancy forecasting #3

Open
thibautjombart opened this issue Apr 6, 2020 · 11 comments
Open

Package: bed occupancy forecasting #3

thibautjombart opened this issue Apr 6, 2020 · 11 comments
Labels
high_priority Urgent for COVID19 analytics medium_complexity Can be completed by 1 person in a few (<5) days. new_package Create a new R package

Comments

@thibautjombart
Copy link
Contributor

thibautjombart commented Apr 6, 2020

Package: bed occupancy forecasting

Description

Package existing code to implement forecasting of daily bed occupancy from input on daily admissions and the distribution of duration of stay.

The package should include documentation, simple reproducible examples, unit tests.

Use cases / specs

  1. predict bed occupancy from existing admission data
    • input dates of admissions as Date
    • input numbers of admissions as integer
    • input distribution of the length of stay (LoS) as a distcrete object (preferably) or as a function returning integer numbers along the lines of rgeom()
  2. do the above, but with several admission trajectories (e.g. from different models)
    • input on admissions as a projections object
    • input LoS as in 1.
  3. outputs should be projections objects

Relevant related pacakges

Starting point

A starting point is provided in the two following gists:

Impact

Some of the main questions for COVID19 response relate to hospital bed capacity. For this, multpile modelling groups are making forecastings of bed occupancy, e.g. our app. However there is no consistent way of doing this, leaving room for potential errors and consistency across results. Having a reliable, computer-efficient, well documented and tested way of forecasting bed occupancy will improve the quality of, and the ease for providing, these predictions.

Proposed timeline

  • first viable product: as soon as possible; end of May?
  • first stable version (CRAN release): less urgent, the main thing is to have this package, tested, documented, and publicly accessible

Focal point

@thibautjombart

@thibautjombart thibautjombart added high_complexity Requires multiple people and/or specialists to complete. high_priority Urgent for COVID19 analytics new_package Create a new R package medium_complexity Can be completed by 1 person in a few (<5) days. and removed high_complexity Requires multiple people and/or specialists to complete. labels Apr 6, 2020
@TimTaylor
Copy link
Member

If this is still required I've started packaging it up over at https://github.com/tjtnew/bedoc. A little hectic today but can probably get something reasonable finished by c.o.p tomorrow if it's still required.

@thibautjombart
Copy link
Contributor Author

Yes, this is still needed :) I'll remove my totally unrealistic deadline, but this is still very topical and timely. Many thanks for getting started on this!

@TimTaylor
Copy link
Member

A few questions:

  • What are you trying to avoid with the sanity check n_admissions[1] >= 1. A user may want to illustrate the ramp up from zero cases over a particular interval. Perhaps a warning instead to highlight if they start with zero cases in case they didn't mean to?
  • Are you wanting both simulate_occupancy and bed_forecasts to be exposed to the user?
  • If simulate_occupancy is supposed to be exposed to the user then the code restriction to only predictions dates <= the maximum provided admission date is a little confusing. The reason I say this is that "use case 1", "predict bed occupancy from existing admission data" would seem to make this redundant otherwise. Should we instead allow the user to specify the end date (but default to the maximum admission date). This point can be ignored if simulate_occupancy is only an internal function to be used only by bed_forecasts.
    Hope these q make sense!

@TimTaylor
Copy link
Member

I've thought some more in regards to the second and third bullets above. I think it would make sense to have simulate_occupancy as an internal function and only export bed_forecasts. The output of simulate_occupancy can be recovered from bed_forecasts by only inputting a single projection so I think this is the cleaner option. We can then also move the sanity checks in to bed_forecasts and just test that functionality. If you agree I'll modify later and then start on a more detailed use case for a vignette.

@thibautjombart
Copy link
Contributor Author

Sorry about the slow response on this.

  • What are you trying to avoid with the sanity check n_admissions[1] >= 1. A user may want to illustrate the ramp up from zero cases over a particular interval. Perhaps a warning instead to highlight if they start with zero cases in case they didn't mean to?

That's a mistake, having it's origin in a technicality, from memory: the function was throwing an error if there was a day of zero admissions due to using rep (maybe?). At any rate, it should be all(n_admissions >= 1) rather than the current test. Note that the corner case where all data in n_admissions is 0 may have to be handled separately: in this case the model will not predict a single bed.

  • Are you wanting both simulate_occupancy and bed_forecasts to be exposed to the user?

I named these poorly. I think ideally this should perhaps be a generic with specific S3 methods for different classes of objects? And to be consistent with projections, maybe we could settle for project_beds(). What do you think?

  • If simulate_occupancy is supposed to be exposed to the user then the code restriction to only predictions dates <= the maximum provided admission date is a little confusing. The reason I say this is that "use case 1", "predict bed occupancy from existing admission data" would seem to make this redundant otherwise. Should we instead allow the user to specify the end date (but default to the maximum admission date).

Yes (to the last suggestion).

This point can be ignored if simulate_occupancy is only an internal function to be used only by bed_forecasts.
Hope these q make sense!

It does. Thank you so much for your help - very appreciated!

@thibautjombart
Copy link
Contributor Author

I've thought some more in regards to the second and third bullets above. I think it would make sense to have simulate_occupancy as an internal function and only export bed_forecasts. The output of simulate_occupancy can be recovered from bed_forecasts by only inputting a single projection so I think this is the cleaner option. We can then also move the sanity checks in to bed_forecasts and just test that functionality. If you agree I'll modify later and then start on a more detailed use case for a vignette.

OK, it would make sense, but maybe a middle group, using S3 generic / method, would be to provid a method for incidence objects too, so that users can:

  • forecast bed occupancy from existing admission data (stored as incidence)
  • forecast bed occupancy from projected admissions (stored as projections)

An easy way to go about this would be adding a as_projections.incidence method (or as(..., "projections")) in projections to convert incidence objects to projections. This is what is done effectively at:
https://gist.github.com/thibautjombart/7dacaee1e1ec3e5bf27e522d9312be76#file-simulate_occupancy-r-L87-L89

@TimTaylor
Copy link
Member

TimTaylor commented Apr 16, 2020

OK, it would make sense, but maybe a middle group, using S3 generic / method, would be to provid a method for incidence objects too, so that users can:

* forecast bed occupancy from existing admission data (stored as `incidence`)

* forecast bed occupancy from projected admissions (stored as `projections`)

That sounds good.

In regards to a vignette is any of the data/projections used in your paper available in the open? I'm guessing not, but it would make a nice example if they were.

@thibautjombart
Copy link
Contributor Author

thibautjombart commented Apr 17, 2020

Unfortunately no, no publicly available data for this specific example. But we can try to make a decent example using realistic epi parameters. I paste below code for a length-of-stay distribution publicly available, with some random admissions as incidence:

## Data from across 23 countries with “many of the included cases from the
## UK”. 290 patients, of which 163 with recorded outcomes. 52 admitted to
## ICU. Data collected on patients with data collection commenced on or before
## 13 March 2020. Report issued 27 March 2020.
## https://media.tghn.org/medialibrary/2020/03/ISARIC_Data_Platform_COVID-19_Report_27MAR20.zip

## Mean hospital LOS to death or discharge (n=163) 5.3 days (SD 4.25)
## Roughly: median 5 days, QIR: 3-7
los_isaric <- distcrete::distcrete("weibull", shape = 2, scale = 6, w = 0.5, interval = 1)

## make toy admission data
library(incidence)
library(magrittr)
admissions <- sample(Sys.Date() - c(0,7), 40, replace = TRUE) %>% 
  incidence()

## project bed occupancy for current admissions 

## project future admissions
[...]

## project bed occupancy for current + future admissions
[...]

Note that the missing bit, beyond the bed occupancy part, is a serial interval distribution (si in projections::project). This can be obtained by:

  1. getting published parameters e.g. from MIDAS
  2. converting, if needed, these parameters to build a distribution (things like epitrix::gamma_mucv2shapescale may help to convert parameters specified in the literature into something we can build a gamma with)
  3. using distcrete to build a discretised distribution

I will ask what serial interval is currently used for UK data.

@TimTaylor
Copy link
Member

TimTaylor commented Apr 17, 2020

project_beds is now a generic with methods for projections and incidence objects. I've changed the code a bit from your initial gists so it's probably worth you kicking the tyres a bit.

I've included a few simple integrations tests and we have good coverage save for the sanity checks which we could easily check although perhaps a little unnecessary.

CI is set up with GitHub Actions which checks the build on mac (release + devel), windows (release) and linux (release). It also builds the basic pkgdown site and integrates coverage with codecov. Currently these run as separate jobs so this could be streamlined somewhat.

From your last comment I take it you want to implement a predict_admissions function. Is it worth me starting with a simple exponential growth method to begin with (the doubling approach in the app) and we can build on this going forward.

Can we move future discussion over to https://github.com/tjtnew/bedoc/issues as it will be good to keep the discussion tied to the repo. On a similar note let me know if you want to shift the repo over to Reconhub and for us continue development there.

Have a good weekend.

@thibautjombart
Copy link
Contributor Author

Impressive! Great, I will take the conversation there.

@TimTaylor
Copy link
Member

Package now at https://github.com/reconhub/occupancy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high_priority Urgent for COVID19 analytics medium_complexity Can be completed by 1 person in a few (<5) days. new_package Create a new R package
Projects
None yet
Development

No branches or pull requests

2 participants