Match {Matching} | R Documentation |
This function preforms multivariate matching. This function is
intended to be used in conjunction with the MatchBalance
function which checks if the results of this function have actually
achieved balance on a set of covariates. If one wants to do
propensity score matching, one should estimate the propensity model
before calling Match
, and then send Match
the propensity
scores to use. The GenMatch
function can be used to
automatically find balance by the use of a genetic search
algorithm which deterimes the optimal weight to give each covariate.
Match
provides principled standard errors when matching is done
with covariates or a known propensity score. Ties are handled in a
deterministic and coherent fashion.
Match(Y, Tr, X, Z = X, V = rep(1, length(Y)), estimand = "ATT", M = 1, BiasAdj = FALSE, exact = NULL, caliper = NULL, Weight = 1, Weight.matrix = NULL, weights = rep(1, length(Y)), Var.calc = 0, sample = FALSE, tolerance = 1e-05, distance.tolerance = 1e-05, version="fast")
Y |
A vector containing the outcome of interest. Missing values are not allowed. |
Tr |
A vector indicating the observations which are in the treatment regime and those which are not. This can either be a logical vector or a real vector where 0 denotes control and 1 denotes treatment. |
X |
A matrix containing the variables we wish to match on. This matrix may contain the actual observed covariates or the propensity score or a combination of both. |
Z |
A matrix containing the covariates for which we wish to make bias adjustments. |
V |
A matrix containing the covariates for which the variance
of the causal effect may vary. Also see the Var.calc option,
which takes precedence. |
estimand |
A character string for the estimand. The default estimand is "ATT", the sample average treatment effect for the treated. "ATE" is the sample average treatment effect (for all), and "ATC" is the sample average treatment effect for the controls. |
M |
A scalar for the number of matches which should be found (with replacement). The default is one-to-one matching. |
BiasAdj |
A logical scalar for whether regression adjustment
should be used. See the Z matrix. |
exact |
A logical scalar or vector for whether exact matching
should be done. Variables which are to be exactly matched on
should also be given a very large weight (e.g., 1000) via the
Weight.matrix option. If a logical scalar is
provided, that logical value is applied to all covariates of
X . If a logical vector is provided, a logical value should
be provided for each covariate in X . Using a logical vector
allows the user to specify exact matching for some but not other
variables. When exact matches are not found, observations are
dropped. distance.tolerance determines what is considered to be an
exact match. The exact option takes precedence over the
caliper option. |
caliper |
A scalar or vector denoting the caliper(s) which
should be used when matching. A caliper is the distance which is
acceptable for any match. Observations which are outside of the
caliper are dropped. If a scalar caliper is provided, this caliper is
used for all covariates in X . If a vector of calipers is
provided, a caliper value should be provide for each covariate in
X . The caliper is interpreted to be in standardized units. For example,
caliper=.25 means that all matches not equal to or within .25
standard deviations of each covariate in X are dropped.
The ecaliper object which is returned by Match shows
the enforced caliper on the scale of the X variables. |
Weight |
A scalar for the type of
weighting scheme the matching algorithm should use when weighting
each of the covariates in X . The default value of
1 denotes that weights are equal to the inverse of the variances. 2
denotes the Mahalanobis distance metric, and 3 denotes
that the user will supply a weight matrix (Weight.matrix ). Note that
if the user supplies a Weight.matrix , Weight will be automatically
set to be equal to 3. |
Weight.matrix |
This matrix denotes the weights the matching
algorithm uses when weighting each of the covariates in X —see
the Weight option. This square matrix should have as many
columns as the number of columns of the X matrix. This matrix
is usually provided by a call to the GenMatch function
which finds the optimal weight each variable should be given so as to
achieve balance on the covariates. For most uses, this matrix has zeros in the off-diagonal cells. This matrix can be used to weight some variables more than others. For example, if X contains three variables and we want to
match as best as we can on the first, the following would work well:
> Weight.matrix <- diag(3) > Weight.matrix[1,1] <- 1000/var(X[,1]) > Weight.matrix[2,2] <- 1/var(X[,2]) > Weight.matrix[3,3] <- 1/var(X[,3]) This code changes the weights implied by the inverse of the variances by multiplying the first variable by a 1000 so that it is highly weighted. In order to enforce exact matching see the exact and caliper options. |
weights |
A vector the same length as Y which
provides observations specific weights. |
Var.calc |
A scalar for the variance estimate
that should be used. By default Var.calc=0 which means that
homoscedasticity is assumed. For values of Var.calc > 0 ,
robust variances are calculated using Var.calc matches. |
sample |
A logical flag for whether the population or sample variance is returned. |
tolerance |
This is a scalar which is used to determine numerical tolerances. This option is used by numerical routines such as those used to determine if matrix is singular. |
distance.tolerance |
This is a scalar which is used to determine if distances
between two observations are different from zero. Values less than
distance.tolerance are deemed to be equal to zero. This
option can be used to perform a type of optimal matching |
version |
The version of the code to be used. The "fast" C/C++ version of the code is used unless the "old" (stable) version is requested. |
This function is intended to be used in conjunction with the
MatchBalance
function which checks if the results of this
function have actually achieved balance. The results of this function
can be summarized by a call to the summary.Match
function. If one wants to do propensity score matching, one should estimate the
propensity model before calling Match
, and then place the
fitted values in the X
matrix—see the provided example.
The GenMatch
function can be used to automatically
find balance by the use of a genetic search algorithm which deterimes
the optimal weight to give each covariate. The object returned by
GenMatch
can be supplied to the Weight.matrix
option of Match
to obtain estimates.
Three demos are included: GerberGreenImai
, DehejiaWahba
,
and AbadieImbens
. These can be run by calling the
demo
function such as by demo(DehejiaWahba)
.
est |
The estimated average causal effect. |
se |
The standard error. This standard error is principled if
X consists of either covariates or a known propensity score
because it takes into account the uncertainty of the matching
procedure. If an estimated propensity score is used, the
uncertainty involved in its estimation is not accounted for although the
uncertainty of the matching procedure itself still is. |
est.noadj |
The estimated average causal effect without any
BiasAdj . If BiasAdj is not requested, this is the
same as est . |
se.naive |
The naive standard error. This is the standard error
calculated on the matched data using the usual method of calculating
the difference of means (between treated and control) weighted by the
observation weights provided by weights . Note that the
standard error provided by se takes into account the uncertainty
of the matching procedure while se.naive does not. Neither
se nor se.naive take into account the uncertainty of
estimating a propensity score. se.naive does
not take into account any BiasAdj . Summary of the naive
results can be requested by setting the full=TRUE flag when
using the summary.Match function on the object
returned by
Match . |
se.cond |
The conditional standard error. The practitioner should not generally use this. |
mdata |
A list which contains the matched datasets produced by
Match . Three datasets are included in this list: Y ,
Tr and X . |
index.treated |
A vector containing the observation numbers from
the original dataset for the treated observations in the
matched dataset. This index in conjunction with index.control
can be used to recover the matched dataset produced by
Match . For example, the X matrix used by Match
can be recovered by
rbind(X[index.treated,],X[index.control,]) . The user should
generally just examine the output of mdata . |
index.control |
An index for the control observations in the
matched data. This index in conjunction with index.treated
can be used to recover the matched dataset produced by
Match . For example, the X matrix used by Match
can be recovered by
rbind(X[index.treated,],X[index.control,]) . The user should
generally just examine the output of mdata . |
weights |
The weight for the matched dataset. If all of the observations had a weight of 1 on input, they will have a weight of 1 on output if each observation was only matched once. |
orig.nobs |
The original number of observations in the dataset. |
orig.wnobs |
The original number of weighted observations in the dataset. |
orig.treated.nobs |
The original number of treated observations (unweighted). |
nobs |
The number of observations in the matched dataset. |
wnobs |
The number of weighted observations in the matched dataset. |
caliper |
The caliper which was used. |
ecaliper |
The size of the enforced caliper on the scale of the
X variables. This object has the same length as the number of
covariates in X . |
exact |
The value of the exact function argument. |
ndrops |
The number of matches dropped either because of the caliper or exact matching. |
R version by Jasjeet S. Sekhon, Harvard University,
jasjeet_sekhon@harvard.edu,
http://jsekhon.fas.harvard.edu/
Matlab version by Guido Imbens, University of California, Berkeley, http://elsa.berkeley.edu/~imbens/.
Abadie, Alberto and Guido Imbens. 2004.
``Large Sample Properties of Matching Estimators for Average
Treatment Effects.'' Working Paper.
http://ksghome.harvard.edu/~.aabadie.academic.ksg/sme.pdf
Sekhon, Jasjeet S. 2004. ``The Varying Role of Voter Information Across Democratic Societies.'' Working Paper. http://jsekhon.fas.harvard.edu/papers/SekhonInformation.pdf
Also see summary.Match
,
GenMatch
,
MatchBalance
,
balanceMV
, balanceUV
, ks.boot
,
GerberGreenImai
, lalonde
# # Replication of Dehejia and Wahba psid3 model # # Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in Non-Experimental Studies: Re-Evaluating the # Evaluation of Training Programs.''Journal of the American Statistical Association 94 (448): 1053-1062. # data(lalonde) # # Estimate the propensity model # glm1 <- glm(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, family=binomial, data=lalonde) # #save data objects # X <- glm1$fitted Y <- lalonde$re78 Tr <- lalonde$treat # # one-to-one matching with replacement (the "M=1" option). # Estimating the treatment effect on the treated (the "estimand" option which defaults to 0). # rr <- Match(Y=Y,Tr=Tr,X=X,M=1); summary(rr) # # Let's check for balance # 'nboots' and 'nmc' are set to small values in the interest of speed. # Please increase to at least 500 each for publication quality p-values. mb <- MatchBalance(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, data=lalonde, match.out=rr, nboots=10, nmc=10)