Matchby {Matching} | R Documentation |
This function is a wrapper for the Match
function which
separates the matching problem into subgroups defined by a factor.
This is equivalent to conducting exact matching on each level of a factor.
Matches within each level are found as determined by the
usual matching options. This function is much faster for large
datasets than the Match
function itself.
Matchby(Y, Tr, X, by, estimand = "ATT", M = 1, exact = NULL, caliper = NULL, Weight = 1, Weight.matrix = NULL, tolerance = 1e-05, distance.tolerance = 1e-05, print.level=1, version="fast", ...)
Y |
A vector containing the outcome of interest. Missing values are not allowed. |
Tr |
A vector indicating the observations which are in the treatment regime and those which are not. This can either be a logical vector or a real vector where 0 denotes control and 1 denotes treatment. |
X |
A matrix containing the variables we wish to match on. This matrix may contain the actual observed covariates or the propensity score or a combination of both. |
by |
A "factor" in the sense that as.factor(by) defines the
grouping, or a list of such factors in which case their
interaction is used for the grouping. |
estimand |
A character string for the estimand. The default estimand is "ATT", the sample average treatment effect for the treated. "ATE" is the sample average treatment effect (for all), and "ATC" is the sample average treatment effect for the controls. |
M |
A scalar for the number of matches which should be found (with replacement). The default is one-to-one matching. |
exact |
A logical scalar or vector for whether exact matching
should be done. If a logical scalar is provided, that logical value is
applied to all covariates of
X . If a logical vector is provided, a logical value should
be provided for each covariate in X . Using a logical vector
allows the user to specify exact matching for some but not other
variables. When exact matches are not found, observations are
dropped. distance.tolerance determines what is considered to be an
exact match. The exact option takes precedence over the
caliper option. |
caliper |
A scalar or vector denoting the caliper(s) which
should be used when matching. A caliper is the distance which is
acceptable for any match. Observations which are outside of the
caliper are dropped. If a scalar caliper is provided, this caliper is
used for all covariates in X . If a vector of calipers is
provided, a caliper value should be provide for each covariate in
X . The caliper is interpreted to be in standardized units. For
example, caliper=.25 means that all matches not equal to or
within .25 standard deviations of each covariate in X are
dropped. |
Weight |
A scalar for the type of
weighting scheme the matching algorithm should use when weighting
each of the covariates in X . The default value of
1 denotes that weights are equal to the inverse of the variances. 2
denotes the Mahalanobis distance metric, and 3 denotes
that the user will supply a weight matrix (Weight.matrix ). Note that
if the user supplies a Weight.matrix , Weight will be automatically
set to be equal to 3. |
Weight.matrix |
This matrix denotes the weights the matching
algorithm uses when weighting each of the covariates in X —see
the Weight option. This square matrix should have as many
columns as the number of columns of the X matrix. This matrix
is usually provided by a call to the GenMatch function
which finds the optimal weight each variable should be given so as to
achieve balance on the covariates. For most uses, this matrix has zeros in the off-diagonal cells. This matrix can be used to weight some variables more than others. For example, if X contains three variables and we want to
match as best as we can on the first, the following would work well:
> Weight.matrix <- diag(3) > Weight.matrix[1,1] <- 1000/var(X[,1]) > Weight.matrix[2,2] <- 1/var(X[,2]) > Weight.matrix[3,3] <- 1/var(X[,3]) This code changes the weights implied by the inverse of the variances by multiplying the first variable by a 1000 so that it is highly weighted. In order to enforce exact matching see the exact and caliper options. |
tolerance |
This is a scalar which is used to determine numerical tolerances. This option is used by numerical routines such as those used to determine if a matrix is singular. |
distance.tolerance |
This is a scalar which is used to determine if distances
between two observations are different from zero. Values less than
distance.tolerance are deemed to be equal to zero. This
option can be used to perform a type of optimal matching |
print.level |
The level of printing. Set to '0' to turn off printing. |
version |
The version of the code to be used. The "fast" C/C++ version of the code does not calculate Abadie Imbens standard errors. The end-user should not change this option. |
... |
Additional arguments passed on to Match . |
Matchby
is much faster for large datasets than
Match
. But Matchby
only implements a subset of
the functionality of Match
. For example, the
restrict
option cannot be used, Abadie-Imbens standard errors
are not provided and bias adjustment cannot be requested.
Matchby
is a wrapper for the Match
function which
separates the matching problem into subgroups defined by a factor. This
is the equivalent to doing exact matching on each factor, and the
way in which matches are found within each factor is determined by the
usual matching options.
est |
The estimated average causal effect. |
se.standard |
The usual standard error. This is the standard error calculated on the matched data using the usual method of calculating the difference of means (between treated and control) weighted so that ties are taken into account. |
ret |
A matrix with three columns. The first column contains the outcomes for treated observations, the second the outcomes for control observations, and the third the weight given to each match. |
orig.nobs |
The original number of observations in the dataset. |
nobs |
The number of observations in the matched dataset. |
wnobs |
The number of weighted observations in the matched dataset. |
orig.treated.nobs |
The original number of treated observations. |
ndrops |
The number of matches which were dropped because there were not enough observations in a given group and because of caliper and exact matching. |
estimand |
The estimand which was estimated. |
version |
The version of Match which was used. |
Jasjeet S. Sekhon, UC Berkeley, sekhon@berkeley.edu, http://sekhon.berkeley.edu/.
Sekhon, Jasjeet S. 2006. ``Matching: Algorithms and Software for Multivariate and Propensity Score Matching with Balance Optimization via Genetic Search.'' http://sekhon.berkeley.edu/matching/
Sekhon, Jasjeet S. 2006. ``Alternative Balance Metrics for Bias Reduction in Matching Methods for Causal Inference.'' Working Paper. http://sekhon.berkeley.edu/papers/SekhonBalanceMetrics.pdf
Abadie, Alberto and Guido Imbens. 2005. ``Large Sample Properties of Matching Estimators for Average Treatment Effects.'' Econometrica 74(1): 235-267. http://ksghome.harvard.edu/~.aabadie.academic.ksg/sme.pdf
Diamond, Alexis and Jasjeet S. Sekhon. 2005. ``Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.'' Working Paper. http://sekhon.berkeley.edu/papers/GenMatch.pdf
Imbens, Guido. 2004. Matching Software for Matlab and Stata. http://elsa.berkeley.edu/~imbens/estimators.shtml
Also see Match
,
summary.Matchby
,
GenMatch
,
MatchBalance
,
balanceMV
, balanceUV
,
qqstats
, ks.boot
,
GerberGreenImai
, lalonde
# # Match exactly by racial groups and then match using the propensity score within racial groups # data(lalonde) # # Estimate the Propensity Score # glm1 <- glm(treat~age + I(age^2) + educ + I(educ^2) + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, family=binomial, data=lalonde) #save data objects # X <- glm1$fitted Y <- lalonde$re78 Tr <- lalonde$treat # one-to-one matching with replacement (the "M=1" option) after exactly # matching on race using the 'by' option. Estimating the treatment # effect on the treated (the "estimand" option defaults to ATT). rr <- Matchby(Y=Y, Tr=Tr, X=X, by=lalonde$black, M=1); summary(rr)