Mortality modelling using scalable Bayesian hierarchical models.
This code is used in:
- Rashid, T., Bennett, J.E. et al. (2021). Life expectancy and risk of death in 6791 communities in England from 2002 to 2019: high-resolution spatiotemporal analysis of civil registration data. The Lancet Public Health.
- Bennett, J.E., Rashid, T. et al. (2023). Changes in life expectancy and house prices in London from 2002 to 2019: hyper-resolution spatiotemporal analysis of death registration and real estate data. The Lancet Regional Health Europe.
- Rashid, T., Bennett, J.E. et al. (2023). Mortality from leading cancers in districts of England from 2002 to 2019: a population-based, spatiotemporal study. The Lancet Oncology.
This project has been developed further in sparklabnyc/bayesian-envhealth-models.
file | paper | likelihood | terms | spatial effects |
---|---|---|---|---|
nested.bug | Rashid 2021 | gamma-Poisson | nested | |
BYM.bug | Rashid 2021 | gamma-Poisson | BYM | |
nested_bb.bug | Bennett 2023 | beta-binomial | nested | |
nested.py | - | binomial | nested | |
car.py | Rashid 2023 | binomial | ICAR |
The models are fitted using nimble. For ease of reading and to aid the user more familiar with other MCMC software, I've also added the model structure as BUGS code.
We are modelling the death rate per person in a given spatial unit, year and age group stratum. It is the death rate per person that varies between models. The following spatial effects are used:
- The nested model is designed for a three-level nested spatial hierarchy. In our case it follows the ONS' hierarchical output area geographies. Each Lower layer Super Output Area (LSOA) lies within a Middle layer Super Output Area (MSOA), which lies within a Local Authority District (LAD) (can also be used with MSOA, LAD and region). The spatial effects are modelled as IID.
- The BYM model shares information between the nearest neighbours to each spatial unit.
An example invocation of the code from the command line is as follows:
Rscript run_model.R MSOA nested 1 10000 5000 --num_chains=4
For the full explanation of the options available, run
Rscript run_model.R --help
By porting the model to numpyro, I have seen massive speedups, both in terms of run time and effective samples per second. This is thanks to numpyro's jax backend allowing sampling on a GPU, which is beneficial for large models, and using NUTS over nimble's conjugate Gibbs/RWMH samplers.
A simplified version of the model has been contributed as an example for the numpyro documentation.
The car.py
model uses the ICAR distribution for spatial effects by setting the correlation parameter of the CAR distribution to 0.99. The model can be run as:
poetry run python car.py --region="MSOA" --sex=1 --num_samples=10000 --num_warmup=5000 --num_chains=4 --device="cpu"
This repo also includes the linear mixed effects model for estimating house prices, implemented in house_price.R
.
The model has effects for space, time, housing type, ownership type, new or old, season, and number of bedrooms.
See Appendix Table 5 of Bennett 2023 for more information.
Data used in the analysis are controlled by the Small Area Health Statistics Unit who do not have permission to release data to third parties. Individual mortality data can be requested through the Office for National Statistics. If you would like a file containing simulated numbers that allow you to test the code, please contact [email protected].