output | ||||
---|---|---|---|---|
|
-
For nearest neighbor matching, optimal full matching, and genetic matching, calipers can now be negative, which forces paired units to be further away from each other on the given variables.
-
Fixed a bug when matching with a nonzero
ratio
where subclass membership was incorrectly calculated. Thanks to Simon Loewe (@simon-lowe) for originally pointing it out. (#207, #208) -
Fixed a bug with printing.
-
Documentation fixes.
Most improvements are related to performance. Some of these dramatically improve speeds for large datasets. Most come from improvements to Rcpp
code.
-
When using
method = "nearest"
,m.order
can now be set to"farthest"
to prioritize hard-to-match treated units. Note this does not implement "far matching" but simply changes the order in which the closest matches are selected. -
Speed improvements to
method = "nearest"
, especially when matching on a propensity score. -
Speed improvements to
summary()
whenpair.dist = TRUE
and amatch.matrix
component is not included in the output (e.g., formethod = "full"
ormethod = "quick"
). -
Speed improvements to
method = "subclass"
withmin.n
greater than 0. -
A new
normalize
argument has been added tomatchit()
. When set toTRUE
(the default, which used to be the only option), the nonzero weights in each treatment group are rescaled to have an average of 1. WhenFALSE
, the weights generated directly by the matching are returned instead. -
When using
method = "nearest"
withm.order = "closest"
, the full distance matrix is no longer computed, which increases support for larger samples. This uses an adaptation of an algorithm described by Rassen et al. (2012). -
When using
method = "nearest"
withverbose = TRUE
, the progress bar now displays an estimate of how much time remains. -
When using
method = "nearest"
withm.order = "closest"
andratio
greater than 1, all eligible units will receive their first match before any receive their second, etc. Previously, the closest pairs would be matched regardless of whether other units had been matched. This ensures consistency with otherm.order
arguments. -
Speed and memory improvements to
method = "cem"
with many covariates and a large sample size. Previous versions used a Cartesian expansion of all levels of factor variables, which could easily explode. -
When using
method = "cem"
withk2k = TRUE
,m.order
can be set to select the matching order. Allowable options include"data"
(the default),"closest"
,"farthest"
, and"random"
."closest"
is recommended, but"data"
is the default for now to remain consistent with previous versions. -
Documentation updates.
-
Fixed a bug when using
method = "optimal"
ormethod = "full"
withdiscard
specified anddata
given as a tibble (tbl_df
object). (#185) -
Fixed a bug when using
method = "cardinality"
with a single covariate. (#194)
-
When using
method = "cardinality"
, a new solver, HiGHS, can be requested by settingsolver = "highs"
, which relies on thehighs
package. This is much faster and more reliable than GLPK and is free and easy to install as a regular R package with no additional requirements. -
Fixed a bug when using
method = "optimal"
withdiscard
andexact
specified. Thanks to @NikNakk for the issue and fix. (#171)
-
With
method = "nearest"
,m.order
can now be set to"closest"
to request that the closest potential pairs are matched first. This can be used whether a propensity score is used or not. -
Fixed bugs when
distance = NULL
and no covariates are specified inmatchit()
. -
Changed "empirical cumulative density function" to "empirical cumulative distribution function" in documentation. (#166)
-
Fixed a bug where calipers would not work properly on some systems. Thanks to Bill Dunlap for the solution. (#163)
-
Fixed a bug when
.
was present in formulas. Thanks to @dmolitor. (#167) -
Fixed a bug when nearest neighbor matching for the ATC with
distance
supplied as a numeric distance matrix.
-
Error messages have been improved using
chk
andrlang
, which are now dependencies. -
Fixed a bug when using
method = "nearest"
withreplace = TRUE
andratio
greater than 1. Thanks to Julia Kretschmann. (#159) -
Fixed a bug when using
method = "nearest"
withexact
andratio
greater than 1. Thanks to Sarah Conner. -
Fixed a bug that would occur due to numerical imprecision in
plot.matchit()
. Thanks to @hkmztrk. (#158) -
Fixed bugs when using
method = "cem"
where a covariate was to be omitted from coarsening. Thanks to @jfhelmer. (#160) -
Fixed some typos in the vignettes. Thanks to @fBedecarrats. (#156)
-
Updated vignettes to use
marginaleffects
v0.11.0 syntax.
-
Fixed a bug when using
method = "quick"
withexact
specified. Thanks to @m-marquis. (#149) -
Improved performance and fixed some bugs when using
exact
in cases where some strata contain units from only one treatment group. Thanks to @m-marquis and others for pointing these out. (#151)
-
Nearest neighbor matching now uses a much faster algorithm (up to 6x times faster) when
distance
is a propensity score andmahvars
is not specified. Differences in sort order might cause results to differ from previous versions if there are units with identical propensity scores. -
Template matching has been renamed profile matching in all documentation.
-
After cardinality or profile matching using
method = "cardinality"
withratio
set to a whole number, it is possible to perform optimal Mahalanobis distance matching in the matched sample by supplying the desired matching variables tomahvars
. Previously, the user had to run a separate pairing step. -
Fixed some typos in the vignettes.
-
Fixed a bug where character variables would be flagged as non-finite. Thanks to @isfraser. (#138)
-
Added alt text to images in README and vignettes. (#134)
-
Generalized full matching, as described by Sävje, Higgins, and Sekhon (2021), can now be implemented by setting
method = "quick"
inmatchit()
. It is a dramatically faster alternative to optimal full matching that can support much larger datasets and otherwise has similar balancing performance. See?method_quick
andvignette("matching-methods")
for more information. This functionality relies on thequickmatch
package. -
The package structure has been updated, include with the use of Roxygen for documentation. This should not affect use, but the source code will look different from that of previous versions.
-
When
method = "subclass"
andmin.n = 0
(which is not the default), any units not placed into a subclass are now considered "unmatched" and given weights of 0. Previously they were left in. -
When
method = "genetic"
, the defaultdistance.tolerance
is now 0. In previous versions, this argument was ignored; now it is not. -
For
plot.matchit()
, thewhich.xs
argument can be specified as a one-sided formula. A newdata
argument is allowed if the variables in that formula are not among the original covariates. -
When a factor variable is supplied to
plot.matchit()
withtype = "density"
, the plot now displays all factor levels in the same plot instead of in separate plots for each level, similar tocobalt::bal.plot()
. -
The "Estimating Effects" vignette (
vignette("estimating-effects")
) has been rewritten to be much shorter (and hopefully clearer) and to use themarginaleffects
package, which is now a Suggested package. The new vignette focuses on using g-computation to estimate treatment effects using a single workflow with slight modifications for different situations. -
The error message when covariates have missing or non-finite values is now clearer, identifying which variables are afflicted. This fixes a bug mentioned in #115.
-
Fixed a bug when using
matchit()
withmethod = "cem"
,k2k = TRUE
, andk2k.method = NULL
. Thanks to Florian B. Mayr. -
Fixed a bug when using
method = "optimal"
andmethod = "full"
withexact
andantiexact
specified, wherein a warning would occur about thedrop
argument in subsetting. -
Fixed a bug where
antiexact
would not work correctly withmethod = "nearest"
. Thanks to @gli-1. (#119) -
Fixed typos in the documentation and vignettes.
-
Calculating pair distances in
summary()
withpair.dist = TRUE
is now faster. -
Improved printing of balance results when no covariates are supplied.
-
Updates to the Estimating Effects vignette that dramatically increase the speed of the cluster bootstrap for average marginal effects after matching. Thanks to Yohei Hashimoto for pointing out the inefficiency.
-
Updates to the Assessing Balance vignette to fix errors
-
All vignettes and help files are better protected against Suggested packages not available on CRAN.
-
optmatch
has returned to CRAN, now with an open-source license! A newsolver
argument can be passed tomatchit()
withmethod = "full"
andmethod = "optimal"
to control the solver used to perform the optimization used in the matching. Note that using the default (open source) solver LEMON may yield results different from those obtained prior tooptmatch
0.10.0. For reproducibility questions, please contact theoptmatch
maintainers. -
New functions have been added to compute the Euclidean distance (
euclidean_dist()
), scaled Euclidean distance (scaled_euclidean_dist()
), Mahalanobis distance (mahalanobis_dist()
), and robust Mahalanobis distance (robust_mahalanobis_dist()
). They produce distance matrices that can be supplied to thedistance
argument ofmatchit()
, but see below. -
New distance options are available for
matchit()
based on the distance functions above:"robust_mahalanobis"
,"euclidean"
, and"scaled_euclidean"
, which complement"mahalanobis"
. Similar to"mahalanobis"
, these do not involve estimating a propensity score but rather operate on the covariates directly. These can be used for nearest neighbor matching, optimal matching, full matching, and coarsened exact matching withk2k = TRUE
. -
The Mahalanobis distance is now computed using the pooled within-group covariance matrix (computed by treatment group-mean centering each covariate before computing the covariance in the full sample), in line with how it is computed in
optmatch
and recommended by Rubin (1980) among others. This will cause results to differ between this version and prior versions ofMatchIt
that used the Mahalanobis distance computed ignoring group membership. -
Added the
unit.id
argument tomatchit()
withmethod = "nearest"
, which defines unit IDs so that if a control observation with a given unit ID has been matched to a treated unit, no other control units with the same ID can be used as future matches, ensuring each unit ID is used no more than once. This is useful when, e.g., multiple rows correspond to the same control firm but you only want each control firm to be matched once, in which case firm ID would be supplied tounit.id
. See here for an example use case. -
In
summary.matchit()
,improvement
is now set toFALSE
by default to hide the percentage improvement in balance. Set toTRUE
to recover prior behavior. -
Added clearer errors when required packages are missing for certain
distance
methods. -
Fixed a bug when using
matchit()
withmethod = "nearest"
,ratio
greater than 1, andreuse.max
specified. The bug allowed a previously matched control unit to be matched to the same treatment unit, thereby essentially ignoring theratio
argument. It now works as intended. -
Fixed a bug in
matchit()
withmethod = "nearest"
whendistance
was supplied as a matrix andInf
values were present. -
Fixed a bug when using exact matching that caused an infinite loop when variable levels contained commas. Thanks to @bking124. (#111)
-
Fixed a bug introduced by
optmatch
version 0.10.3. -
Documentation updates.
-
Updated the logo, thanks to Ben Stillerman.
-
optmatch
has been removed from CRAN. Instructions on installing it are in?method_optimal
and?method_full
. -
When
s.weights
are supplied withdistance = "randomforest"
, the weights are supplied torandomForest::randomForest()
. -
Improved conditional use of packages, especially
optmatch
. This may mean that certain examples fail to run in the vignettes.
- Fixed a bug where
rbind.matchdata()
would produce datasets twice their expected length. Thanks to @sconti555. (#98)
-
Fixed a bug where the
q.cut
component of thematchit
object whenmethod = "subclass"
was not included. Now it is. Thanks to @aldencabajar. (#92) -
The
nn
andqn
components of thematchit
object have been removed. They are now computed bysummary.matchit()
and included in thesummary.matchit
object. -
Removed the code to disable compiler checks to satisfy CRAN requirements.
-
Added the
reuse.max
argument tomatchit()
withmethod = "nearest"
. This controls the maximum number of times each control unit can be used as a match. Settingreuse.max = 1
is equivalent to matching without replacement (i.e., like settingreplace = FALSE
), and settingreuse.max = Inf
is equivalent to matching with replacement with no restriction on the reuse of controls (i.e., like settingreplace = TRUE
). Values in between restrict how many times each control unit can be used as a match. Higher values will tend to improve balance but decrease precision. -
Mahalanobis distance matching with
method = "nearest"
is now a bit faster. -
Fixed a bug where
method = "full"
would fail when some exact matching strata contained exactly one treated unit and exactly one control unit. (#88) -
Fixed a bug introduced in 4.3.0 where the inclusion of character variables would cause the error
"Non-finite values are not allowed in the covariates."
Thanks to Moaath Mustafa. -
Documentation updates.
-
Cardinality and template matching can now be used by setting
method = "cardinality"
inmatchit()
. These methods use mixed integer programming to directly select a matched subsample without pairing or stratifying units that satisfied user-supplied balance constraints. Their results can be dramatically improved when using the Gurobi optimizer. See?method_cardinality
andvignette("matching-methods")
for more information. -
Added
"lasso"
,"ridge"
, and"elasticnet"
as options fordistance
. These estimate propensity scores using lasso, ridge, or elastic net regression, respectively, as implemented in theglmnet
package. -
Added
"gbm"
as an option fordistance
. This estimates propensity scores using generalized boosted models as implemented in thegbm
package. This implementation differs from that intwang
by using cross-validation or out-of-bag error to choose the tuning parameter as opposed to balance. -
A new argument,
include.obj
, has been added tomatchit()
. WhenTRUE
, the intermediate matching object created internally will be included in the output in theobj
component. See the individual methods pages for information on what is included in each output. This is ignored for some methods. -
Density plots can now be requested using
plot.matchit()
by settingtype = "density"
. These display the density of each covariate in the treatment groups before and after matching and are similar to the plots created bycobalt::bal.plot()
. Density plots can be easier to interpret than eCDF plots.vignette("assessing-balance")
has been updated with this addition. -
A clearer error is now produced when the treatment variable is omitted from the
formula
argument tomatchit()
. -
Improvements in how
match.data()
finds the original dataset. It's still always safer to supply an argument todata
, but nowmatch.data()
will look in the environment of thematchit
formula, then the calling environment ofmatch.data()
, then themodel
component of thematchit
object. A clearer error message is now printed when a valid dataset cannot be found in these places. -
Fixed a bug that would occur when using
summary.matchit()
with just one covariate. -
When
verbose = TRUE
and a propensity score is estimated (i.e., using thedistance
argument), a message saying so will be displayed. -
Fixed a bug in
print.matchit()
where it would indicate that the propensity score was used in a caliper if any caliper was specified, even if not on the propensity score. Now, it will only indicate that the propensity score was used in a caliper if it actually was. -
Fixed a bug in
plot.matchit()
that would occur when a level of a factor had no values. -
Speed improvements for
method = "full"
withexact
specified. These changes can make current results differ slightly from past results when thetol
value is high. It is recommended to always use a low value oftol
. -
Typo fixes in documentation and vignettes.
-
Fixed a bug where supplying a "GAM" string to the
distance
argument (i.e., using the syntax prior to version 4.0.0) would ignore the link supplied. -
When an incompatible argument is supplied to
matchit()
(e.g.,reestimate
withdistance = "mahalanobis"
), an error or warning will only be produced when that argument has been set to a value other than its default (e.g., so settingreestimate = FALSE
will no longer throw an error). This fixes an issue brought up by Vu Ng when usingMatchThem
. -
A clearer error is produced when non-finite values are present in the covariates.
-
distance
can now be supplied as a distance matrix containing pairwise distances with nearest neighbor, optimal, and full matching. This means users can create a distance matrix outsideMatchIt
(e.g., usingoptmatch::match_on()
ordist()
) andmatchit()
will use those distances in the matching. See?distance
for details. -
Added
rbind.matchdata()
method formatchdata
andgetmatches
objects (the output ofmatch.data()
andget_matches()
, respectively) to avoid subclass conflicts when combining matched samples after matching within subgroups. -
Added a section in
vignette("estimating-effects")
on moderation analysis with matching, making use of the newrbind()
method. -
Added
antiexact
argument to perform anti-exact matching, i.e., matching that ensures treated and control units have different values of certain variables. See here and here for examples where this feature was requested and might be useful. Anti-exact matching works with nearest neighbor, optimal, full, and genetic matching. The argument toantiexact
should be similar to an argument toexact
: either a string or a one-sidedformula
containing the names of the anti-exact matching variables. -
Slight speed improvements for nearest neighbor matching, especially with
exact
specified. -
With
method = "nearest"
,verbose = TRUE
, andexact
specified, separate messages and progress bars will be shown for each subgroup of theexact
variable(s). -
A spurious warning that would appear when using a large
ratio
withreplace = TRUE
andmethod = "nearest"
no longer appears. -
Fixed a bug when trying to supply
distance
as a labeled numeric vector (e.g., resulting fromhaven
). -
Fixed some typos in the documentation and vignettes.
-
Coarsened exact matching (i.e.,
matchit()
withmethod = "cem"
) has been completely rewritten and no longer involves thecem
package, eliminating some spurious warning messages and fixing some bugs. All the same arguments can still be used, so old code will run, though some results will differ slightly. Additional options are available for matching and performance has improved. See?method_cem
for details on the differences between the implementation in the current version ofMatchIt
and that incem
and older versions ofMatchIt
. In general, these changes make coarsened exact matching function as one would expect it to, circumventing some peculiarities and bugs in thecem
package. -
Variable ratio matching is now compatible with
method = "optimal"
in the same way it is withmethod = "nearest"
, i.e., by using themin.controls
andmax.controls
arguments. -
With
method = "full"
andmethod = "optimal"
, the maximum problem size has been set to unlimited, so that larger datasets can be used with these methods without error. They may take a long time to run, though. -
Processing improvements with
method = "optimal"
due to rewriting some functions inRcpp
. -
Using
method = "optimal"
runs more smoothly when combining it with exact matching through theexact
argument. -
When using
ratio
different from 1 withmethod = "nearest"
andmethod = "optimal"
and with exact matching, errors and warnings about the number of units that will be matched are clearer. Certainratio
s that would produce errors now only produce warnings. -
Fixed a bug when no argument was supplied to
data
inmatchit()
. -
Improvements to vignettes and documentation.
-
Restored
cem
functionality after it had been taken down and re-uploaded. -
Added
pkgdown
website. -
Computing matching weights after matching with replacement is faster due to programming in
Rcpp
. -
Fixed issues with
Rcpp
code that required C++11. C++11 has been added to SystemRequirements in DESCRIPTION, andMatchIt
now requires R version 3.1.0 or later.
-
match.data()
, which is used to create matched datasets, has a few new arguments. Thedata
argument can be supplied with a dataset that will have the matching weights and subclasses added. If not supplied,match.data()
will try to figure out the appropriate dataset like it did in the past. Thedrop.unmatched
argument controls whether unmatched units are dropped from the output. The default isTRUE
, consistent with past behavior. Warnings are now more informative. -
get_matches()
, which seems to have been rarely used since it performed a similar function tomatch.data()
, has been revamped. It creates a dataset with one row per unit per matched pair. If a unit is part of two separate pairs (e.g., as a result of matching with replacement), it will get two rows in the output dataset. The goal here was to be able to implement standard error estimators that rely both on repeated use of the same unit and subclass/pair membership, e.g., Austin & Cafri (2020). Otherwise, it functions similarly tomatch.data()
. NOTE: the changes toget_matches()
are breaking changes! Legacy code will not work with the new syntax! -
print.matchit()
has completely changed and now prints information about the matching type and specifications.summary.matchit()
contains all the information that was in the oldprint
method. -
A new function,
add_s.weights()
, adds sampling weights tomatchit
objects for use in balance checking and effect estimation. Sampling weights can also be directly supplied tomatchit()
through the news.weights
argument. A new vignette describing how to usingMatchIt
with sampling weights is available atvignette("sampling-weights")
. -
The included dataset,
lalonde
, now uses arace
variable instead of separateblack
andhispan
variables. This makes it easier to see how character variables are treated byMatchIt
functions. -
Added extensive documentation for every function, matching method, and distance specification. Documentation no longer links to
gking.harvard.edu/matchit
as it now stands alone.
-
An argument to
data
is no longer required if the variables informula
are present in the environment. -
When missing values are present in the dataset but not in the treatment or matching variables, the error that used to appear no longer does.
-
The
exact
argument can be supplied either as a character vector of names of variables indata
or as a one-sided formula. A full cross of all included variables will be used to create bins within which matching will take place. -
The
mahvars
argument can also be supplied either as a character vector of names of variables indata
or as a one-sided formula. Mahalanobis distance matching will occur on the variables in the formula, processed bymodel.matrix()
. Use this when performing Mahalanobis distance matching on some variables within a caliper defined by the propensity scores estimated from the variables in the mainformula
using the argument todistance
. For regular Mahalanobis distance matching (without a propensity score caliper), supply the variables in the mainformula
and setdistance = "mahalanobis"
. -
The
caliper
argument can now be specified as a numeric vector with a caliper for each variable named in it. This means you can separately impose calipers on individual variables as well as or instead of the propensity score. For example, to require that units within pairs must be no more than .2 standard deviations ofX1
away from each other, one could specifycaliper = c(X1 = .2)
. A new optionstd.caliper
allows the choice of whether the caliper is in standard deviation units or not, and one value per entry incaliper
can be supplied. An unnamed entry tocaliper
applies the caliper to the propensity score and the default ofstd.caliper
isFALSE
, so this doesn't change the behavior of old code. These options only apply to the methods that accept calipers, namely"nearest"
,"genetic"
, and"full"
. -
A new
estimand
argument can be supplied to specify the target estimand of the analysis. For all methods, the ATT and ATC are available with the ATT as the default, consistent with prior behavior. For some methods, the ATE is additionally available. Note that setting the estimand doesn't actually mean that estimand is being targeted; if calipers, common support, or other restrictions are applied, the target population will shift from that requested.estimand
just triggers the choice of which level of the treatment is focal and what formula should be used to compute weights from subclasses. -
In methods that accept it,
m.order
can be set to "data
", which matches in the order the data appear. Withdistance = "mahalanobis"
,m.order
can be "random
" or "data
", with "data
" as the default. Otherwise,m.order
can be"largest"
,"smallest"
,"random"
, or"data"
, with"largest"
as the default (consistent with prior behavior). -
The output to
matchit()
has changed slightly; the componentX
is now a data frame, the result of a call tomodel.frame()
with the formula provided. Ifexact
ormahvars
are specified, their variables are included as well, if not already present. It is included for all methods and is the same for all methods. In the past, it was the result of a call tomodel.matrix()
and was only included for some methods. -
When key arguments are supplied to methods that don't accept them, a warning will be thrown.
-
method
can be set toNULL
to not perform matching but create amatchit
object, possibly with a propensity score estimated usingdistance
or with a common support restriction usingdiscard
, for the purpose of supplying tosummary.matchit()
to assess balance prior to matching.
-
Matching is much faster due to re-programming with
Rcpp
. -
With
method = "nearest"
, asubclass
component containing pair membership is now included in the output whenreplace = FALSE
(the default), as it has been with optimal and full matching. -
When using
method = "nearest"
withdistance = "mahalanobis"
, factor variables can now be included in the mainformula
. The design matrix no longer has to be full rank because a generalized inverse is used to compute the Mahalanobis distance. -
Unless
m.order = "random"
, results will be identical across runs. Previously, several random choices would occur to break ties. Ties are broken based on the order of the data; shuffling the order of the data may therefore yield different matches. -
When using
method = "nearest"
with a caliper specified, the nearest control unit will be matched to the treated unit if one is available. Previously, a random control unit within the caliper would be selected. This eliminates the need for thecalclosest
argument, which has been removed. -
Variable ratio extremal matching as described by Ming & Rosenbaum (2000) can be implemented using the new
min.controls
andmax.controls
arguments. -
Added ability to display a progress bar during matching, which can be activated by setting
verbose = TRUE
.
-
Fixed bug in
method = "optimal"
, which produced results that did not matchoptmatch
. Now they do. -
Added support for optimal and full Mahalanobis distance matching by setting
method = "mahalanobis"
withmethod = "optimal"
andmethod = "full"
. Previously, both methods would perform a random match ifmethod
was set to"mahalanobis"
. Now they use the native support inoptmatch::pairmatch()
andoptmatch::fullmatch()
for Mahalanobis distance matching. -
Added support for exact matching with
method = "optimal"
andmethod = "full"
. As withmethod = "nearest"
, the names of the variables for which exact matches are required should be supplied to theexact
argument. This relies onoptmatch::exactMatch()
. -
The warning that used to occur about the order of the match not guaranteed to be the same as the original data no longer occurs.
-
For
method = "full"
, theestimand
argument can be set to"ATT"
,"ATC"
, or"ATE"
to compute matching weights that correspond to the given estimand. See?matchit
for details on how weights are computed for eachestimand
.
-
Fixed a bug with
method = "genetic"
that caused an error with someratio
greater than 1. -
The default of
replace
inmethod = "genetic"
is nowFALSE
, as it is withmethod = "nearest"
. -
When
verbose = FALSE
, the default, no output is printed withmethod = "genetic"
. Withverbose = TRUE
, the printed output ofMatching::GenMatch()
withprint.level = 2
is displayed. -
The
exact
argument now correctly functions withmethod = "genetic"
. Previously, it would have to be specified in accordance with its use inMatching::GenMatch()
. -
Different ways to match on variables are now allowed with
method = "genetic"
, similar to how they are withmethod = "nearest"
. Ifdistance = "mahalanobis"
, no propensity score will be computed, and genetic matching will be performed just on the variables supplied toformula
. Ifmahvars
is specified, genetic matching will be performed on the variables supplied tomahvars
, but balance will be optimized on all covariates supplied toformula
. Otherwise, genetic matching will be performed on the variables supplied toformula
and the propensity score. Previously,mahvars
was ignored. Balance is now always optimized on the variables included informula
and never on the propensity score, whereas in the past the propensity score was always included in the balance optimization. -
The
caliper
argument now works as it does withmethod = "nearest"
and other methods rather than needing to be supplied in a way thatMatching::Match()
would accept. -
A
subclass
component is now included in the output whenreplace = FALSE
(the default), as it has been with optimal and full matching.
-
With
method = "cem"
, thek2k
argument is now recognized. Previously it was ignored unless an argument tok2k.method
was supplied. -
The
estimand
argument can be set to"ATT"
,"ATC"
, or"ATE"
to compute matching weights that correspond to the given estimand. Previously only ATT weights were computed. See?matchit
for details on how weights are computed for eachestimand
.
-
Performance improvements.
-
A new argument,
min.n
, can be supplied, which controls the minimum size a treatment group can be in each subclass. When any estimated subclass doesn't have enough members from a treatment group, units from other subclasses are pulled to fill it so that every subclass will have at leastmin.n
units from each treatment group. This uses the same mechanism as is used inWeightIt
. The defaultmin.n
is 1 to ensure there are at least one treated and control unit in each subclass. -
Rather than producing warnings and just using the default number of subclasses (6), when an inappropriate argument is supplied to
subclass
, an error will occur. -
The new
subclass
argument tosummary()
can be used to control whether subclass balance statistics are computed; it can beTRUE
(display balance for all subclasses),FALSE
(display balance for no subclasses), or a vector of subclass indices on which to assess balance. The default isFALSE
. -
With
summary()
, balance aggregating across subclasses is now computed using subclass weights instead of by combining the subclass-specific balance statistics. -
The
sub.by
argument has been replaced withestimand
, which can be set to"ATT"
,"ATC"
, or"ATE"
to replace thesub.by
inputs of"treat"
,"control"
, and"all"
, respectively. Previously, weights forsub.by
that wasn't"treat"
were incorrect; they are now correctly computed for all inputs toestimand
.
-
The allowable options to
distance
have changed slightly. The input should be either"mahalanobis"
for Mahalanobis distance matching (without a propensity score caliper), a numeric vector of distance values (i.e., values whose absolute pairwise differences form the distances), or one of the allowable options. The new allowable values include"glm"
for propensity scores estimated withglm()
,"gam"
for propensity scores estimated withmgcv::gam()
,"rpart"
for propensity scores estimated withrpart::rpart()
,"nnet"
for propensity scores estimated withnnet::nnet()
,"cbps"
for propensity scores estimated withCBPS::CBPS()
, orbart
for propensity scores estimated withdbarts::bart2()
. To specify a link (e.g., for probit regression), specify an argument to the newlink
parameter. For linear versions of the propensity score, specifylink
as"linear.{link}"
. For example, for linear probit regression propensity scores, one should specifydistance = "glm", link = "linear.probit"
. The defaultdistance
is"glm"
and the default link is"logit"
, so these can be omitted if either is desired. Not all methods accept alink
, and for those that don't, it will be ignored. If an old-styledistance
is supplied, it will be converted to an appropriate specification with a warning (except fordistance = "logit"
, which will be converted without a warning). -
Added
"cbps"
as option fordistance
. This estimates propensity scores using the covariate balancing propensity score (CBPS) algorithm as implemented in theCBPS
package. Setlink = "linear"
to use a linear version of the CBPS. -
Added
"bart"
as an option fordistance
. This estimates propensity scores using Bayesian Additive Regression Trees (BART) as implemented in thedbarts
package. -
Added
"randomforest"
as an option fordistance
. This estimates propensity scores using random forests as implemented in therandomForest
package. -
Bugs in
distance = "rpart"
have been fixed.
-
When
interactions = TRUE
, interactions are no longer computed with the distance measure or between dummy variables of the same factor. Variable names are cleaned up and easier to read. -
The argument to
addlvariables
can be specified as a data frame or matrix of covariates, a formula with the additional covariates (and transformations) on the right side, or a character vector containing the names of the additional covariates. For the latter two, if the variables named do not exist in theX
component of thematchit
output object or in the environment, an argument todata
can be supplied tosummary()
that contains these variables. -
The output for
summary()
is now the same for all methods (except subclassification). Previously there were different methods for a few different types of matching. -
The eCDF median (and QQ median) statistics have been replaced with the variance ratio, which is better studied and part of several sets of published recommendations. The eCDF and QQ median statistics provide little information above and beyond the corresponding mean statistics. The variance ratio uses the variances weighted by the matching weights.
-
The eCDF and QQ statistics have been adjusted. Both now use the weights that were computed as part of the matching. The eCDF and QQ statistics for binary variables are set to the difference in group proportions. The standard deviation of the control group has been removed from the output.
-
The default for
standardize
is nowTRUE
, so that standardized mean differences and eCDF statistics will be displayed by default. -
A new column for the average absolute pair difference for each covariate is included in the output. The values indicate how far treated and control units within pairs are from each other. An additional argument to
summary.matchit()
,pair.dist
, controls whether this value is computed. It can take a long time for some matching methods and could be omitted to speed up computation. -
Balance prior to matching can now be suppressed by setting
un = FALSE
. -
Percent balance improvement can now be suppressed by setting
improvement = FALSE
. Whenun = FALSE
,improvement
is automatically set toFALSE
.
-
Plots now use weighted summaries when weights are present, removing the need for the
num.draws
argument. -
Added a new plot type,
"ecdf"
, which creates empirical CDF plots before and after matching. -
The appearance of some plots has improved (e.g., text is appropriately centered, axes are more clearly labeled). For eQQ plots with binary variables or variables that take on only a few values, the plots look more like clusters than snakes.
-
The argument to
type
can be abbreviated (e.g.,"j"
for jitter). -
Fixed a bug that caused all plots generated after using
plot(., type = "hist")
to be small. -
When specifying an argument to
which.xs
to control for which variables balance is displayed graphically, the input should be the name of the original variable rather than the version that appears in thesummary()
output. In particular, if a factor variable was supplied tomatchit()
, it should be referred to by its name rather than the names of its split dummies. This makes it easier to view balance on factor variables without having to know or type the names of all their levels. -
eQQ plots can now be used with all matching methods. Previously, attempting
plot()
aftermethod = "exact"
would fail.
- The summary plot has been completely redesigned. It is now a Love plot made using
graphics::dotchart()
. A few options are available for ordering the variables, presenting absolute or raw standardized mean differences, and placing threshold lines on the plots. For a more sophisticated interface, seecobalt::love.plot()
, which natively supportsmatchit
objects and usesggplot2
as its engine.