MatchBalance {Matching}R Documentation

Tests for Univariate and Multivariate Balance

Description

This function provides a variety of balance statistics useful for determining if balance exists in any unmatched dataset and in matched datasets produced by the Match function. Matching is performed by the Match function, and MatchBalance is used to determine if Match was successful in achieving balance on the observed covariates.

Usage

MatchBalance(formul, data = NULL, match.out = NULL, ks = TRUE, mv = FALSE,
             nboots=500, nmc=nboots,  maxit = 1000,
             weights=rep(1,nrow(data)), digits=5, verbose=1,
             paired=TRUE, ...)

Arguments

formul This formula does not estimate any model. The formula is simply an efficient way to use the R modeling language to list the variables we wish to obtain univariate balance statistics for. The dependent variable in the formula is usually the treatment indicator. One should include many functions of the observed covariates. Generally, one should request balance statistics on more higher-order terms and interactions than were used to conduct the matching itself.
data A data frame which contains all of the variables in the formula. If a data frame is not provided, the variables are obtained via lexical scoping.
match.out The output object from the Match function. If this output is included, MatchBalance will provide balance statistics for both before and after matching. Otherwise balance statistics will only be reported for the raw unmatched data.
ks A logical flag for whether the univariate bootstrap Kolmogorov-Smirnov (KS) test should be calculated. If the ks option is set to true, the univariate KS test is calculated for all non-dichotomous variables. The bootstrap KS test is consistent even for non-continuous variables. See ks.boot for more details.
mv A logical flag for whether multivariate balance tests (the Kolmogorov-Smirnov and Chi-Square tests) should be calculated. If this flag is TRUE, then the formula provided to MatchBalance will be used to estimate a logistic regression. And the multivariate tests will be conducted on the predicted probabilities of treatment for both treated and control based on the formula. The predicted probability densities for both treated and control should be indistinguishable if balance has been achieved. The model defined by this formula is estimated separately for the matched and unmatched datasets.
maxit The maximum number of iterations for the glm logistic procedure.
weights A vector of observation specific weights.
nboots The number of bootstrap samples to be run. If zero, no bootstraps are done. Bootstrapping is highly recommended because the bootstrapped Kolmogorov-Smirnov test provides correct coverage even when the distributions being compared are not continuous. At least 500 nboots (preferably 1000) are recommended for publication quality p-values.
nmc This option is only used if the mv flag is TRUE. The number of Monte Carlo simulations to be conducted for each multivariate Kolmogorov-Smirnov test calculated. Monte Carlo simulations are highly recommended because the usual Kolmogorov-Smirnov test is not consistent when the densities being compared contain point masses. At least 500 nmc (preferably 1000) are recommended for publication quality p-values. Also see the nboots option.
digits The number of significant digits that should be displayed.
verbose The amount of printing to be done. If zero, there is no printing. If one, the results are summarized. If two, details of the computations are printed.
paired A flag for whether the paired t.test should be used after matching. Regardless of the value of this option, an unpaired t.test is done for the unmatched data because it is assumed that the unmatched data were not generated by a paired experiment.
... Further arguments passed to balanceMV.

Details

This function can be used to determine if matching was successful in achieving balance. Both pre- and post-matching balance statistics are provided. Difference of means between treatment and control groups are provided as well as a variety of summary statistics for the empirical-QQ (eQQ) plot between the two groups. The first set of eQQ results are the standardized mean, median and maximum differences. The second set of eQQ results are summaries of the raw differences.

Two univariate tests are also provided: the t-test and the bootstrap Kolmogorov-Smirnov (KS) test. These tests should not be treated as hypothesis tests in the usual fashion because we wish to maximize balance without limit. The bootstrap KS test is highly recommended (see the ks and nboots options) because the bootstrap KS is consistent even for non-continuous distributions. Before matching, the two sample t-test is used; after matching, the paired t-test is used.

Two multivariate tests are provided. The KS and Chi-Square null deviance tests. The KS test is to be preferred over the Chi-Square test because the Chi-Square test is not testing the relevant hypothesis. The null hypothesis for the KS test is equal balance in the estimated probabilities between treated and control. The null hypothesis for the Chi-Square test, however, is all of the parameters being insignificant; a comparison of residual versus null deviance. If the covariates being considered are discrete, this KS test is asymptotically nonparametric as long as the logit model does not produce zero parameter estimates.

NA's are handled by the na.action option. But it is highly recommended that NA's not simply be deleted, but one should checked to make sure that missingness is balanced.

Value

mv A return object from a call to balanceMV
uv A return object from a call to balanceUV. The univariate tests performed on the last variable in formul are returned. For the other variables call balanceUV directly. Note that the univariate test results for all of the variables in formul are printed if verbose > 0.

Author(s)

Jasjeet S. Sekhon, UC Berkeley, sekhon@berkeley.edu, http://sekhon.berkeley.edu/.

References

Sekhon, Jasjeet S. 2007. ``Multivariate and Propensity Score Matching Software with Automated Balance Optimization.'' Working Paper. http://sekhon.berkeley.edu/papers/MatchingJSS.pdf

Sekhon, Jasjeet S. 2006. ``Alternative Balance Metrics for Bias Reduction in Matching Methods for Causal Inference.'' Working Paper. http://sekhon.berkeley.edu/papers/SekhonBalanceMetrics.pdf

Diamond, Alexis and Jasjeet S. Sekhon. 2005. ``Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.'' Working Paper. http://sekhon.berkeley.edu/papers/GenMatch.pdf

Abadie, Alberto. 2002. ``Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models.'' Journal of the American Statistical Association, 97:457 (March) 284-292.

Hall, Peter. 1992. The Bootstrap and Edgeworth Expansion. New York: Springer-Verlag.

Wilcox, Rand R. 1997. Introduction to Robust Estimation. San Diego, CA: Academic Press.

William J. Conover (1971), Practical nonparametric statistics. New York: John Wiley & Sons. Pages 295-301 (one-sample "Kolmogorov" test), 309-314 (two-sample "Smirnov" test).

Shao, Jun and Dongsheng Tu. 1995. The Jackknife and Bootstrap. New York: Springer-Verlag.

See Also

Also see Match, GenMatch, balanceMV, balanceUV, qqstats, ks.boot, GerberGreenImai, lalonde

Examples

#
# Replication of Dehejia and Wahba psid3 model
#
# Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in Non-Experimental Studies: Re-Evaluating the
# Evaluation of Training Programs.''Journal of the American Statistical Association 94 (448): 1053-1062.
#
data(lalonde)

#
# Estimate the propensity model
#
glm1  <- glm(treat~age + I(age^2) + educ + I(educ^2) + black +
             hisp + married + nodegr + re74  + I(re74^2) + re75 + I(re75^2) +
             u74 + u75, family=binomial, data=lalonde)

#
#save data objects
#
X  <- glm1$fitted
Y  <- lalonde$re78
Tr  <- lalonde$treat

#
# one-to-one matching with replacement (the "M=1" option).
# Estimating the treatment effect on the treated (the "estimand" option which defaults to 0).
#
rr  <- Match(Y=Y,Tr=Tr,X=X,M=1);

#Let's summarize the output
summary(rr)

#
# Let's check for balance
# 'nboots' and 'nmc' are set to small values in the interest of speed.
# Please increase to at least 500 each for publication quality p-values.  
mb  <- MatchBalance(treat~age + I(age^2) + educ + I(educ^2) + black +
                    hisp + married + nodegr + re74  + I(re74^2) + re75 + I(re75^2) +
                    u74 + u75, data=lalonde, match.out=rr, nboots=10, nmc=10)

[Package Matching version 4.2-6 Index]