README.Rmd

---
output: github_document
bibliography: inst/references.bib
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  dpi = 300
)
```

# did2s

<!-- badges: start -->
<!-- badges: end -->

The goal of did2s is to estimate TWFE models without running into the problem of staggered treatment adoption. 

For common issues, see this issue: [https://github.com/kylebutts/did2s/issues/12](https://github.com/kylebutts/did2s/issues/12)

## Installation

You can install did2s from CRAN with:

``` r
install.packages("did2s")
```

To install the development version, run the following:

```{r, eval = FALSE}
devtools::install_github("kylebutts/did2s")
```

## Two-stage Difference-in-differences [@Gardner_2021]

For details on the methodology, view this [vignette](https://kylebutts.github.io/did2s/articles/Two-Stage-Difference-in-Differences.html)

To view the documentation, type `?did2s` into the console.


The main function is `did2s` which estimates the two-stage did procedure. This function requires the following options:

- `yname`: the outcome variable
- `first_stage`: formula for first stage, can include fixed effects and covariates, but do not include treatment variable(s)!
- `second_stage`: This should be the treatment variable or in the case of event studies, treatment variables.
- `treatment`: This has to be the 0/1 treatment variable that marks when treatment turns on for a unit. If you suspect anticipation, see note above for accounting for this.
- `cluster_var`: Which variables to cluster on

Optional options:

- `weights`: Optional variable to run a weighted first- and second-stage regressions
- `bootstrap`: Should standard errors be bootstrapped instead? Default is False.
- `n_bootstraps`: How many clustered bootstraps to perform for standard errors. Default is 250.

did2s returns a list with two objects:

1. fixest estimate for the second stage with corrected standard errors.

### TWFE vs. Two-Stage DID Example

I will load example data from the package and plot the average outcome among the groups.

```{r load-data}
# Automatically loads fixest
library(did2s)

# Load Data from R package
data("df_het", package = "did2s")
df_het = as.data.frame(df_het)
```

Here is a plot of the average outcome variable for each of the groups:

```{r plot-df-het, fig.width=8, fig.height=4, dpi=300, fig.cap="Example data with heterogeneous treatment effects"}
# Mean for treatment group-year
agg <- aggregate(df_het$dep_var, by = list(g = df_het$g, year = df_het$year), FUN = mean)

agg$g <- as.character(agg$g)
agg$g <- ifelse(agg$g == "0", "Never Treated", agg$g)

never <- agg[agg$g == "Never Treated", ]
g1 <- agg[agg$g == "2000", ]
g2 <- agg[agg$g == "2010", ]


plot(0, 0,
  xlim = c(1990, 2020), ylim = c(3.5, 7.2), type = "n",
  main = "Data-generating Process", ylab = "Outcome", xlab = "Year"
)
abline(v = c(1999.5, 2009.5), lty = 2)
lines(never$year, never$x, col = "#8e549f", type = "b", pch = 15)
lines(g1$year, g1$x, col = "#497eb3", type = "b", pch = 17)
lines(g2$year, g2$x, col = "#d2382c", type = "b", pch = 16)
legend(
  x = 1990, y = 7.1, col = c("#8e549f", "#497eb3", "#d2382c"),
  pch = c(15, 17, 16),
  legend = c("Never Treated", "2000", "2010")
)
```


### Estimate Two-stage Difference-in-Differences 

First, lets estimate a static did. There are two things to note here. First, note that I can use `fixest::feols` formula including the `|` for specifying fixed effects and `fixest::i` for improved factor variable support. Second, note that `did2s` returns a `fixest` estimate object, so `fixest::etable`, `fixest::coefplot`, and `fixest::iplot` all work as expected.

```{r static}
# Static
static <- did2s(
  df_het,
  yname = "dep_var", first_stage = ~ 0 | state + year,
  second_stage = ~ i(treat, ref = FALSE), treatment = "treat",
  cluster_var = "state"
)

fixest::etable(static)
```

This is very close to the true treatment effect of ~2.23.

Then, let's estimate an event study did. Note that relative year has a value of `Inf` for never treated, so I put this as a reference in the second stage formula.

```{r event-study}
# Event Study
es <- did2s(df_het,
  yname = "dep_var", first_stage = ~ 0 | state + year,
  second_stage = ~ i(rel_year, ref = Inf), treatment = "treat",
  cluster_var = "state"
)
```

And plot the results:

```{r plot-es, fig.cap="Event-study plot with example data", fig.width=8, fig.height=5, dpi=300}
fixest::iplot(es, main = "Event study: Staggered treatment", xlab = "Relative time to treatment", col = "steelblue", ref.line = -0.5, drop = "Inf")

# Add the (mean) true effects
true_effects <- head(tapply((df_het$te + df_het$te_dynamic), df_het$rel_year, mean), -1)
points(-20:20, true_effects, pch = 20, col = "black")

# Legend
legend(
  x = -20, y = 3, col = c("steelblue", "black"), pch = c(20, 20),
  legend = c("Two-stage estimate", "True effect")
)
```


### Comparison to TWFE

```{r plot-compare, fig.cap="TWFE and Two-Stage estimates of Event-Study", fig.width=8, fig.height=5, dpi=300}
twfe <- feols(dep_var ~ i(rel_year, ref = c(Inf, -1)) | unit + year, data = df_het)

fixest::iplot(
  list(es, twfe),
  sep = 0.2, ref.line = -0.5,
  col = c("steelblue", "#82b446"), pt.pch = c(20, 18),
  xlab = "Relative time to treatment",
  main = "Event study: Staggered treatment (comparison)",
  drop = "Inf"
)

# Legend
legend(
  x = -20, y = 3, col = c("steelblue", "#82b446"), pch = c(20, 18),
  legend = c("Two-stage estimate", "TWFE")
)
```


### Honest DID

In version 1.1.0, we added support for computing a sensitivity analysis using the approach of Rambachan and Roth (2021). 

Here's an example using data from [here](https://github.com/Mixtape-Sessions/Advanced-DID/tree/main/Exercises/Exercise-1). The provided dataset `ehec_data.dta` contains a state-level panel dataset on health insurance coverage and Medicaid expansion. The variable `dins` shows the share of low-income childless adults with health insurance in the state. The variable `yexp2` gives the year that a state expanded Medicaid coverage under the Affordable Care Act, and is missing if the state never expanded.

```{r ehec-data-est, fig.cap="Estimates of the effect of Medicaid expansion on health insurance coverage", fig.width=8, fig.height=5, dpi=300}
library(HonestDiD)
library(ggplot2)

df <- haven::read_dta("https://raw.githubusercontent.com/Mixtape-Sessions/Advanced-DID/main/Exercises/Data/ehec_data.dta")

df$treated <- ifelse(is.na(df$yexp2), 0, 1 * (df$year >= df$yexp2))
df$rel_year <- ifelse(is.na(df$yexp2), -100, df$year - df$yexp2)

# Estimate did2s
es_did2s <- did2s(
  df,
  yname = "dins",
  first_stage = ~ 0 | stfips + year,
  second_stage = ~ 0 + i(rel_year, ref = -100),
  treatment = "treated",
  cluster_var = "stfips"
)

iplot(es_did2s, drop = "-100")
```

```{r sensitivity, fig.cap="Sensitivity analysis for the example data", fig.width=8, fig.height=5, dpi=300}
# Relative Magnitude sensitivity analysis
sensitivity_results <- es_did2s |> 
  # Take fixest obj and convert for `honest_did_did2s`
  get_honestdid_obj_did2s(coef_name = "rel_year") |>
  # Run sensitivity analysis
  honest_did_did2s(
    e = 0,
    type = "relative_magnitude",
    Mbarvec = seq(from = 0.5, to = 2, by = 0.5)
  )

# Create plot
HonestDiD::createSensitivityPlot_relativeMagnitudes(
  sensitivity_results$robust_ci,
  sensitivity_results$orig_ci
)
```


## Event-study plot

```{r, fig.cap="Multiple event-study estimators", fig.width=12, fig.height=8.5, dpi=300}
library(tidyverse)
data(df_het)
df = df_het
multiple_ests = did2s::event_study(
  data = df |> mutate(g = ifelse(g == Inf, NA, g)) |> as.data.frame(),
  gname = "g",
  idname = "unit",
  tname = "year",
  yname = "dep_var",
  estimator = "all"
)
did2s::plot_event_study(multiple_ests)
```

# Citation

If you use this package to produce scientific or commercial publications, please cite according to:

```{r}
citation(package = "did2s")
```


# References