10_disease-risk-modeling.Rmd

# Disease risk modeling

**Learning objectives:**

- how to model disease risk
- how to map disease risk


## Introduction

- Bayesian spatial models have been used to understand geographic patterns and risk factors of childhood overweight and obesity prevalence in Costa Rica (Gómez et al. 2023), and mosquito-borne diseases in Brazil (Pavani, Bastos, and Moraga 2023). 

- Spatial methods can also be extended to analyze areal data that are both spatially and temporally referenced. 


## Modeling of lung cancer risk in Pennsylvania

Bayesian spatial model to estimate the risk of lung cancer and assess its relationship with smoking in Pennsylvania, USA, in 2002.

```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(SpatialEpi)
data(pennLC)
names(pennLC)
```


```{r message=FALSE, warning=FALSE}
head(pennLC$data)
```

```{r}
library(sf)
map <- st_as_sf(pennLC$spatial.polygon)
countynames <- sapply(slot(pennLC$spatial.polygon, "polygons"),
                      function(x){slot(x, "ID")})
map$county <- countynames
head(map)
```

```{r}
d <- group_by(pennLC$data, county) %>% 
  summarize(Y = sum(cases))
head(d)
```
```{r}
p1 <- map %>%
  right_join(d, by = c("county" = "county")) %>%
  ggplot()+
  geom_sf(aes(fill=Y), color="navy")+
  scale_fill_continuous(low="white", high="navy")+
  geom_sf_text(aes(label=county,alpha=Y,color="orange"), size=2)+
  coord_sf()+
  labs(title = "Lung cancer cases",
       subtitle = "Pennsylvania 1990-1994",
       caption = "Source: PennLC data",
       fill = "Cases")+
  theme_minimal()
p1
```


### Expected cases

$$
E_i = \sum_{j=1}^m r_j^{(s)} n_j^{(s)}
$$

where $r_j^{(s)}$ is the rate of disease in stratum $j$ and $n_j^{(s)}$ is the population in stratum $j$.


```{r}
pennLC$data <- pennLC$data[order(pennLC$data$county,
                                 pennLC$data$race, 
                                 pennLC$data$gender, 
                                 pennLC$data$age), ]

E <- expected(population = pennLC$data$population,
              cases = pennLC$data$cases, 
 # 2 races, 2 genders, and 4 age groups (2×2×4 = 16)
              n.strata = 16)
d$E <- E
head(d)
```

Let's visualize the difference between observed and expected cases.
```{r message=FALSE, warning=FALSE}
map %>%
  right_join(d, by = c("county" = "county")) %>%
  pivot_longer(cols = c("Y", "E"), 
               names_to = "type", 
               values_to = "value") %>%
  mutate(value=round(value))%>%
  ggplot()+
  geom_sf(aes(fill=value), color="navy")+
  scale_fill_continuous(low="white", high="navy")+
  geom_sf_text(aes(label=value,
                   alpha=value,
                   color="orange"), 
               fontface="bold",
               size=2)+
  coord_sf()+
  facet_wrap(~type)+
  guides(fill = guide_colorbar(title = "Cases"),alpha="none",color="none")+
  labs(title = "Lung cancer Observed and Expected cases",
       subtitle = "Pennsylvania 1990-1994",
       caption = "Source: PennLC data",
       fill = "Type")+
  theme_minimal()+
  theme(axis.text = element_blank(),
        axis.title = element_blank(),
        legend.position = "bottom")
```

### Standardized Mortality Ratios

$$
SMR_i = \frac{Y_i}{E_i}
$$

if $SMR_i > 1$, then the county has more cases than expected, and if $SMR_i < 1$, then the county has fewer cases than expected.

```{r}
# add the smoking data
d <- dplyr::left_join(d, pennLC$smoking, by = "county")
d$SMR <- d$Y/d$E
head(d)
```

```{r message=FALSE, warning=FALSE}
ggplot(d, aes(x=smoking, y=SMR))+
  geom_point(aes(color=smoking))+
  geom_smooth(se=FALSE)+
  labs(title = "Lung cancer SMR vs Smoking",
       subtitle = "Pennsylvania 1990-1994",
       caption = "Source: PennLC data",
       x = "Smoking",
       y = "SMR")+
  theme_minimal()
```

```{r}
map <- dplyr::left_join(map, d, by = "county")
```

```{r message=FALSE, warning=FALSE}
map %>%
  ggplot()+
  geom_sf(aes(fill=SMR), color="navy")+
  scale_fill_continuous(low="white", high="navy")+
  geom_sf_text(aes(label=county,
                   alpha=SMR,
                   color="orange"), 
               size=2)+
  coord_sf()+
  guides(fill = guide_colorbar(title = "Cases"),
         alpha="none",
         color="none")+
  labs(title = "Lung cancer SMR",
       subtitle = "Pennsylvania 1990-1994",
       caption = "Source: PennLC data",
       fill = "SMR")+
  theme_minimal()
```


## Modeling diseased risk


$$
Y_i| \theta_i \sim Poisson(E_i \times \theta_i)
$$

where 

- $\theta_i$ is the **relative risk** for county $i$,

- $u_i$ is the **structured random effect** for county $i$ modeled with an intrinsic conditional autoregressive (ICAR) model, 

$$
u_i|u_{-i} \sim N(\bar{u}_{\delta_i} \frac{1}{\tau_u n_{\delta_i}})
$$


- $v_i$ is the random effect for stratum $i$

$$
v_i \sim N(0, \frac{1}{\tau_v})
$$


### Neighborhood structure


```{r message=FALSE, warning=FALSE}
library(spdep)
library(INLA)
nb <- poly2nb(map)
nb2INLA("map.adj", nb)
g <- inla.read.graph(filename = "map.adj")

```

### Model

```{r}
map$re_u <- 1:nrow(map)
map$re_v <- 1:nrow(map)
```


```{r}
formula <- Y ~ smoking +
  f(re_u, 
    model = "besag", 
    graph = g, 
    scale.model = TRUE) +
  f(re_v, model = "iid")
```


```{r eval=FALSE}
res <- inla(formula, 
            family = "poisson", 
            data = map, 
            E = E,
            control.predictor = list(compute = TRUE),
            control.compute = list(return.marginals.predictor = TRUE))
```


     res$summary.fixed
                    mean     sd 0.025quant 0.5quant
     (Intercept) -0.3235 0.1498   -0.61925  -0.3233
     smoking      1.1546 0.6226   -0.07569   1.1560
                 0.975quant    mode       kld
     (Intercept)   -0.02877 -0.3234 3.534e-08
     smoking        2.37845  1.1563 3.545e-08


### Relative Risk

     res$summary.fitted.values[1:3, ]
                           mean      sd 0.025quant 0.5quant
     fitted.Predictor.01 0.8781 0.05808     0.7648   0.8778
     fitted.Predictor.02 1.0597 0.02750     1.0072   1.0592
     fitted.Predictor.03 0.9646 0.05089     0.8604   0.9657
                         0.975quant   mode
     fitted.Predictor.01     0.9936 0.8778
     fitted.Predictor.02     1.1150 1.0582
     fitted.Predictor.03     1.0622 0.9681
     
     
     # relative risk
     map$RR <- res$summary.fitted.values[, "mean"]
     
     
     # lower and upper limits 95% CI
     map$LL <- res$summary.fitted.values[, "0.025quant"]
     map$UL <- res$summary.fitted.values[, "0.975quant"]
     
     
See the map here:     

<https://www.paulamoraga.com/book-spatial/disease-risk-modeling.html#mapping-smr>


## Meeting Videos {-}

### Cohort 1 {-}

`r knitr::include_url("https://www.youtube.com/embed/URL")`

<details>
<summary> Meeting chat log </summary>

```
LOG
```
</details>