Vulnerability data processing

Approach

Vulnerability measurement through index aggregation followed the hierarchical methodology described by Tate¹ and used in the English Indices of Multiple Deprivation (IMD) technical report².

Metrics were selected from the literature, considered for inclusion by expert panel, and translated into data available reliably at Lower Layer Super Output Area (LSOA) level. Predominantly census data was used, however if no census-derived proxy was available other sources such as the IMD were considered. If data was not published down to LSOA level, then data from the next administrative level, Middle Layer Super Output Area (MSOA), was used instead.

A total of 33 metrics were then selected for inclusion into the index, with data analysis described below, and grouped into nine sub-domains and four domains:

adaptive capacity,
health,
sensitivity,
living environment.

To demonstrate how an overall risk analysis might proceed, LSOA scores for vulnerability and for exposure (from the hazard analysis) were split into tertiles and plotted onto a 3x3 risk matrix for visualisation. Vulnerability and heat exposure risk matrix score was mapped for a "heat vulnerability" layer, and vulnerability and flooding risk matrix score for a "flooding vulnerability" layer.

Methods

The numerical stages of the methodology were:

Combination of individual metrics - Individual metrics measuring different aspects or facets of the same underlying element of vulnerability were first added together if this could be done without overlap or double counting. For example, in the education metric, the categories "No qualifications", "Level 1" and "Level 2" could simply be added as there is no overlap between them. In the case of the income metric from IMD data, this summation had already been performed. Shrinkage was not applied.
Transformation - The majority of combined metrics were counts of people or households in each LSOA. To allow LSOAs to be compared fairly, these were converted to percentages using an appropriate denominator that related to the total number of people or households 'at risk'. For example, for "No cars or vans in household" the appropriate denominator was the total number of households in the LSOA, whereas for the education metric it was the number of residents aged 16 years and over. The only metric that was not converted to a percentage was income, which was provided as a rank.
Normalisation - For each sub-domain (or in the case of health, domain) the remaining metrics were then placed on a common and dimensionless scale by ranking and transforming to a standard normal distribution based on their ranks.
Weighting and aggregation - The normalised metrics were then combined to produce sub-domain scores (or in the case of health, domain scores). As no sub-domain had more than two constituent metrics, a factor analysis was not used to determine weightings and the metrics were simply summed using a weight of +1 or −1 such that larger scores could be attributed to higher vulnerability.

The sub-domain scores were then converted to a truncated exponential distribution according to Appendix F of the IMD technical report, by ranking and then using the equation:

$$X = -23 \ln⁡(1 - R(1 - \exp^{(-100/23)}))$$

where $X$ is the converted sub-domain score and $R$ is the rank, scaled to the range $(0, 1]$. The conversion to a truncated exponential was performed so that if an LSOA scored highly in one sub-domain (or domain), this could not be cancelled out by a low score in another. As such, vulnerability in one sub-domain generally rendered the LSOA as vulnerable overall.

From the IMD technical report²:

The scaling constant (23) was used in order to produce the objective of achieving roughly 10 per cent cancellation. This means that in the extreme case, a Lower-layer Super Output Area which was ranked most deprived on one domain but least deprived on another would overall be ranked at the 90th percentile in terms of deprivation (if the two domains were equally weighted). This compares to the 50th percentile if the untransformed ranks or a normal distribution had been used instead.

The sub-domains were summed with equal weighting to yield the domain scores.

The domain scores were again converted to a truncated exponential distribution to reduce cancellation effects. The domain scores were than summed with equal weighting (in the absence of a strong justification to apply weighting) to yield the overall vulnerability index, which was ranked and converted to deciles.
Index aggregation - The top-level domains are:
- Adaptive capacity,
- Health,
- Sensitivity,
- Living environment.
The sub-domains are:
- Adaptive capacity: income,
- Adaptive capacity: language,
- Adaptive capacity: helping others,
- Sensitivity: younger people,
- Sensitivity: older, people
- Living environment short term adaptation,
- Living environment longer term adaptation,
- Living environment condition
(health does not have any sub-domains)

Key files and directories

This directory contains code and data for the vulnerability analysis.

data_sources.yaml - Defines all the data sources that need to be downloaded and defines how they should be processed.
code/
- The initial stages 1-5 of the analysis:
- Common configuration for the analysis, including data directories:
  - config.py
notebooks/
- Developing_vulnerability_indicators.ipynb - Calculates of domain and sub-domain scores as well as a combined vulnerability index (stage 6 of the analysis).
- Overall_risk_assessment.ipynb - An example of an overall risk assessment, for demonstration purposes (stage 7 of the analysis).
- Exploratory_data_analysis.ipynb - Initial data exploration (not used in the analysis pipeline).
data/
- 01_raw/ - The raw data automatically downloaded from the internet (in the original file format).
- 02_subset/ - The data for our area of interest (South Gloucestershire) in either csv or Shapefile format.
- 03_combined/ - Data for the same geographic level combined into combined into the same file: 2021 Census LSOA, 2021 Census MSOA and 2011 Census LSOA (the latter used in the IMD data).
- 04_indicators/ - Metrics a.k.a. indicators variables calculated from the combined data, using the formulae specified in data_sources.yaml.
- 05_resampled/ - Data converted to 2021 LSOA-level and combined into one file.
- 06_index/ - Calculation of a combined vulnerability index.
- 07_overall/ - Calculation of an overall risk assessment, for demonstration purposes.

Footnotes

Tate E. Social vulnerability indices: a comparative assessment using uncertainty and sensitivity analysis. Natural Hazards. 2012;63(2):325-47. ↩
McLennan D, Noble S, Noble M, Plunkett E, Wright G, Gutacker N. The English indices of deprivation 2019: technical report. 2019. ↩ ↩²

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Vulnerability data processing

Approach

Methods

Key files and directories

Files

README.md

Latest commit

History

README.md

File metadata and controls

Vulnerability data processing

Approach

Methods

Key files and directories

Footnotes