The unitdid
package provides a set of functions for the analysis of
the unit-level difference-in-differences (Arkhangelsky, Yanagimoto, and
Zohar 2024).
You can install the development version of unitdid from R-universe with:
install.packages('unitdid',
repos = 'https://kazuyanagimoto.r-universe.dev')
This is a basic example with the simulated base_heterocp
data set:
library(unitdid)
library(dplyr)
library(ggplot2)
base_heterocp |>
head()
#> # A tibble: 6 × 5
#> id year byear cyear y
#> <int> <int> <int> <int> <dbl>
#> 1 1 1999 1955 1985 -0.848
#> 2 1 2000 1955 1985 0.759
#> 3 1 2001 1955 1985 -1.03
#> 4 1 2002 1955 1985 0.858
#> 5 1 2003 1955 1985 -0.866
#> 6 1 2004 1955 1985 -0.651
Individual-level child penalties are estimated by unitdid()
:
mdl_base <- base_heterocp |>
unitdid(yname = "y",
iname = "id",
tname = "year",
ename = "cyear",
bname = "byear")
# Estimated individual-level child penalties (y_tilde)
get_unitdid(mdl_base)
#> # A tibble: 32,257 × 6
#> id year byear cyear y y_tilde
#> <int> <int> <int> <int> <dbl> <dbl>
#> 1 705 2000 1957 2000 0.138 -0.0287
#> 2 997 2000 1958 2000 0.138 0.0849
#> 3 998 2000 1958 2000 0.119 -0.104
#> 4 1013 2000 1958 2000 0.115 -0.0000709
#> 5 1082 2000 1958 2000 0.0362 0.00549
#> 6 1127 2000 1958 2000 0.386 0.125
#> 7 1225 2001 1959 2001 0.158 -0.118
#> 8 1228 2000 1959 2000 0.241 -0.0937
#> 9 1228 2001 1959 2000 0.443 0.0226
#> 10 1230 2000 1959 2000 0.143 -0.0266
#> # ℹ 32,247 more rows
They can be aggregated to the full
, event
(year at event
(treatment). Mainly for staggered DiD design), event_age
(age at
event. Mainly for child penalties) levels:
summary(mdl_base) # default agg = "full"
#> # A tibble: 6 × 3
#> rel_time mean n
#> <int> <dbl> <dbl>
#> 1 0 -0.0630 6332
#> 2 1 -0.187 5987
#> 3 2 -0.296 5593
#> 4 3 -0.308 5214
#> 5 4 -0.346 4774
#> 6 5 -0.349 4357
The only_full_horizon
option restricts the summary to the units that
have the full horizon (k_min
, … , k_max
) for the estimates:
sum_eage <- summary(mdl_base, agg = "event_age",
only_full_horizon = TRUE)
sum_eage |>
filter(rel_time == 0) |>
mutate(rel_time = -1,
mean = 0) |>
bind_rows(sum_eage) |>
filter(between(event_age, 25, 34)) |>
mutate(lbl_facet = paste0("Age ", event_age)) |>
ggplot(aes(x = rel_time, y = mean)) +
geom_point() +
geom_line() +
geom_vline(xintercept = -1, linetype = "dashed") +
geom_hline(yintercept = 0) +
facet_wrap(~lbl_facet, ncol = 5) +
labs(x = "Time to First Childbirth",
y = "Child Penalties on y") +
theme_minimal() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank())
Since the individual-level child penalties are estimated with
measurement errors, the variance of the y_tilde
is not equal to the
variance of the individual-level child penalties.
The compute_varcov = "var"
option of the unitdid
estimates the
variance of the measurement errors and the variance of the
individual-level child penalties by subtracting the variance of the
measurement errors from the variance of y_tilde
mdl_base <- base_heterocp |>
unitdid(yname = "y",
iname = "id",
tname = "year",
ename = "cyear",
bname = "byear",
compute_varcov = "var")
sum_eage <- summary(mdl_base, agg = "event_age", only_full_horizon = TRUE)
sum_eage |>
filter(rel_time == 0) |>
mutate(rel_time = -1,
var = 0) |>
bind_rows(sum_eage) |>
filter(between(event_age, 25, 34)) |>
mutate(lbl_facet = paste0("Age ", event_age)) |>
ggplot(aes(x = rel_time, y = sqrt(var))) +
geom_point() +
geom_line() +
geom_vline(xintercept = -1, linetype = "dashed") +
geom_hline(yintercept = 0) +
facet_wrap(~lbl_facet, ncol = 5) +
labs(x = "Time to First Childbirth",
y = "S.D. of Child Penalties") +
theme_minimal() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank())
Arkhangelsky, Dmitry, Kazuharu Yanagimoto, and Tom Zohar. 2024. “Flexible Analysis of Individual Heterogeneity in Event Studies: Application to the Child Penalty.” arXiv. https://arxiv.org/abs/2403.19563.