GenMatch {Matching} | R Documentation |
This function finds optimal balance using multivariate matching where
a genetic search algorithm determines the weight each covariate is
given. This function finds the optimal weight each variable should be
given by Match
so as to achieve balance. Balance is
determined by a variety of univariate test, mainly paired t-tests for
dichotomous variables and an adjusted univariate Kolmogorov-Smirnov
(KS) test for multinomial and continuous variables. The object
returned by this function can be supplied to the Weight.matrix
option of the Match
function to obtain estimates.
GenMatch(Tr, X, BalanceMatrix=X, estimand="ATT", M=1, weights=rep(1,length(Tr)), pop.size = 50, max.generations=100, wait.generations=4, hard.generation.limit=FALSE, starting.values=rep(1,ncol(X)), data.type.integer=TRUE, MemoryMatrix=TRUE, exact=NULL, caliper=NULL, nboots=0, ks=TRUE, verbose=FALSE, tolerance = 1e-05, distance.tolerance=tolerance, min.weight=0, max.weight=1000, Domains=NULL, print.level=2, project.path=NULL, paired=TRUE, ...)
Tr |
A vector indicating the observations which are in the treatment regime and those which are not. This can either be a logical vector or a real vector where 0 denotes control and 1 denotes treatment. |
X |
A matrix containing the variables we wish to match on. This matrix may contain the actual observed covariates or the propensity score or a combination of both. |
BalanceMatrix |
A matrix containing the variables we wish
achieve balance on. This is by default equal to X , but it can
in principle be a matrix which contains more or less variables than
X or variables which are transformed in various ways. See
the examples. |
estimand |
A character string for the estimand. The default estimand is "ATT", the sample average treatment effect for the treated. "ATE" is the sample average treatment effect (for all), and "ATC" is the sample average treatment effect for the controls. |
M |
A scalar for the number of matches which should be found (with replacement). The default is one-to-one matching. |
weights |
A vector the same length as Y which
provides observations specific weights. |
pop.size |
Population Size. This is the number of individuals
genoud uses to solve the optimization problem. See
genoud for more details. |
max.generations |
Maximum Generations. This is the maximum
number of generations that genoud will run when attempting to
optimize a function. This is a soft limit. The maximum
generation limit will be binding for genoud only if
hard.generation.limit has been set equal to TRUE. If it
has not been set equal to TRUE, wait.generations
controls when genoud stops. See genoud for more details. |
wait.generations |
If there is no improvement in the objective
function in this number of generations, genoud will think that it has
found the optimum. The other variables controlling termination are
max.generations and hard.generation.limit . |
hard.generation.limit |
This logical variable determines if the max.generations
variable is a binding constraint for genoud . If
hard.generation.limit is FALSE, then genoud may exceed
the max.generations count if the objective function has
improved within a given number of generations (determined by
wait.generations ). |
starting.values |
This vector equal to the number of variables in X . This
vector contains the starting weights each of the variables is
given. The starting.values vector is a way for the user
to insert one individual into the starting population.
genoud will randomly create the other individuals. These values
correspond to the diagonal of the Weight.matrix as described
in detail in the Match function. |
data.type.integer |
By default only integer weights are considered. If this option is
set to false , search will be done over floating point
weights. This is usually an unnecessary degree of precision. |
MemoryMatrix |
This variable controls if genoud sets up a memory matrix. Such a
matrix ensures that genoud will request the fitness evaluation
of a given set of parameters only once. The variable may be
TRUE or FALSE. If it is FALSE, genoud
will be aggressive in
conserving memory. The most significant negative implication of
this variable being set to FALSE is that genoud will no
longer maintain a memory
matrix of all evaluated individuals. Therefore, genoud may request
evaluations which it has already previously requested. When
the number variables in X is large, the memory matrix
consumes a large amount of RAM.genoud 's memory matrix will require significantly less
memory if the user sets hard.generation.limit equal
to TRUE. Doing this is a good way of conserving
memory while still making use of the memory matrix structure. |
exact |
A logical scalar or vector for whether exact matching
should be done. If a logical scalar is
provided, that logical value is applied to all covariates of
X . If a logical vector is provided, a logical value should
be provided for each covariate in X . Using a logical vector
allows the user to specify exact matching for some but not other
variables. When exact matches are not found, observations are
dropped. distance.tolerance determines what is considered to
be an exact match. The exact option takes precedence over the
caliper option. Obviously, if exact matching is done
using all of the covariates, one should not be using
GenMatch unless the distance.tolerance has been set
unusually high. |
caliper |
A scalar or vector denoting the caliper(s) which
should be used when matching. A caliper is the distance which is
acceptable for any match. Observations which are outside of the
caliper are dropped. If a scalar caliper is provided, this caliper is
used for all covariates in X . If a vector of calipers is
provided, a caliper value should be provide for each covariate in
X . The caliper is interpreted to be in standardized units. For example,
caliper=.25 means that all matches not equal to or within .25
standard deviations of each covariate in X are dropped.
The ecaliper object which is returned by GenMatch shows
the enforced caliper on the scale of the X variables. |
nboots |
The number of bootstrap samples to be run for the
ks test. |
ks |
A logical flag for if the univariate bootstrap
Kolmogorov-Smirnov (KS) test should be calculated. If the ks option
is set to true, the univariate KS test is calculated for all
non-dichotomous variables. The bootstrap KS test is consistent even
for non-continuous variables. See ks.boot for more
details. |
verbose |
If details should be printed for each fit evaluation done by the genetic algorithm. |
tolerance |
This is a scalar which is used to determine numerical tolerances. This option is used by numerical routines such as those used to determine if matrix is singular. |
distance.tolerance |
This is a scalar which is used to determine if distances
between two observations are different from zero. Values less than
distance.tolerance are deemed to be equal to zero. This
option can be used to perform a type of optimal matching |
min.weight |
This is the minimum weight any variable may be given. |
max.weight |
This is the maximum weight any variable may be given. |
Domains |
This is a ncol(X) *2 matrix.
The first column is the lower bound, and the second column is the
upper bound for each variable over which genoud will
search for weights. If the user does not provide this matrix, the
bounds for each variable will be determined by the min.weight
and max.weight options. |
print.level |
This option controls the level of printing. There
are four possible levels: 0 (minimal printing), 1 (normal), 2
(detailed), and 3 (debug). If level 2 is selected, GenMatch will
print details about the population at each generation, including the
best individual found so far. If debug
level printing is requested, details of the genoud
population are printed in the "genoud.pro" file which is located in
the temporary R directory returned by the tempdir
function. See the project.path option for more details.
Because GenMatch runs may take a long time, it is important for the
user to receive feedback. Hence, print level 2 has been set as the
default. |
project.path |
This is the path of the
genoud project file. By default no file is
produced unless print.level=3 . In that case,
genoud places it's output in a file called
"genoud.pro" located in the temporary directory provided by
tempdir . If a file path is provided to the
project.path option, a file will be created regardless of the
print.level . The behavior of the project file, however, will
depend on the print.level chosen. If the print.level
variable is set to 1, then the project file is rewritten after each
generation. Therefore, only the currently fully completed generation
is included in the file. If the print.level variable is set to
2 or higher, then each new generation is simply appended to the
project file. No project file is generated for
print.level=0 . |
paired |
A flag for if the paired t.test should be
used when determining balance. |
... |
Other options which are passed on to genoud . |
This function maximizes the smallest p-value that is observed in any of the univariate tests of balance. During optimization, the smallest observed p-value is printed.
value |
The lowest p-value of the matched dataset. |
par |
A vector of the weights given to each variable in X . |
Weight.matrix |
A matrix whose diagonal corresponds to the weight
given to each variable in X . This object corresponds to the
Weight.matrix in the Match function. |
matches |
A matrix with three columns. The first column contains
the row numbers of the treated observations in the matched dataset.
This column corresponds to the index.treated object which is
returned by Match . The second column gives the row numbers of
the control observations. This column corresponds to the
index.control object which is returned by Match . And
the last column gives the weight that each matched pair is given.
This column corresponds to the weights object which is returned
by Match |
ecaliper |
The size of the enforced caliper on the scale of the
X variables. This object has the same length as the number of
covariates in X . |
Jasjeet S. Sekhon, Harvard University, jasjeet_sekhon@harvard.edu, http://jsekhon.fas.harvard.edu/
Diamond, Alexis and Jasjeet S. Sekhon. 2005. ``Genetic Matching for Estimating Causal Effects: A New Method of Achieving Balance in Observational Studies.'' Working Paper. http://jsekhon.fas.harvard.edu/papers/GenMatch.pdf
Also see Match
, summary.Match
,
MatchBalance
, genoud
,
balanceMV
,
balanceUV
, ks.boot
,
GerberGreenImai
, lalonde
set.seed(38913) data(lalonde) attach(lalonde) #The covariates we want to match on X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74); #The covariates we want to obtain balance on BalanceMat <- cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74, I(re74*re75)); #Let's call GenMatch() to find the optimal weight to give each #covariate in 'X' so as we have achieved balance on the covariates in #'BalanceMat'. This is only an example so we want GenMatch to be quick #to the population size has been set to be only 15 via the 'pop.size' #option. genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1, pop.size=16, max.generations=10, wait.generations=1) #The outcome variable Y=re78/1000; # Now that GenMatch() has found the optimal weights, let's estimate # our causal effect of interest using those weights mout <- Match(Y=Y, Tr=treat, X=X, estimand="ATE", Weight.matrix=genout) summary(mout) # #Let's determine if balance has actually been obtained on the variables of interest # mb <- MatchBalance(treat~age +educ+black+ hisp+ married+ nodegr+ u74+ u75+ re75+ re74+ I(re74*re75), match.out=mout, nboots=500, ks=TRUE, mv=FALSE)