makedist {optmatch} | R Documentation |
Helper function to produce first arguments to fullmatch()
, reducing memory requirements
for fullmatch()
and heading off certain user errors.
makedist(structure.fmla, data, fn = function(trtvar, dat, ...) { matrix(0, sum(trtvar), sum(!trtvar), dimnames = list(names(trtvar)[trtvar], names(trtvar)[!trtvar])) }, ...)
structure.fmla |
A formula defined w.r.t. data frame data ,
with a treatment variable on the LHS and either quote{1}, if no
stratification prior to matching, or variables
defining pre-matching stratification on the RHS. |
data |
A data frame in which structure.fmla is evaluated
and fn operates. |
fn |
A user-supplied function to compute distances. See details and examples. |
... |
Additional arguments to fn |
fn
should be a function with first two arguments
trtvar
, a treatment variable, and
dat
, a data frame. There may be additional arguments. If the
function uses variables in dat
, these should be referenced
using names from the input trtvar
, particularly if the sample is being
split into strata (ie structure.fmla
has a non-trivial RHS).
When this happens, fn
will be passed a trtvar
input
observations for only a subset of the rows of dat
, so it has to
use trtvar
to decide which rows of dat
to operate on; it
does this by lining up names of the (shorter) vector trtvar
with row names of dat
.
A list of matrices, one for each subclass defined by the interaction
of variables appearing on the right hand side of
structure.fmla
. Each of these is a number of treatments by
number of controls matrix of distances, with the distance between
treatments and controls calculated by the user-given function fn
.
The list also has some attributes that are not of direct interest to
the user, but are used by fullmatch()
.
Use of this function to prepare input to fullmatch()
prevents two common problems. First, the function encourages you to
stratify a large data set and match within strata, then calculates
distances only between potential matches within the same stratum.
Whether or not you stratify, the function has the advantage of keeping
track of the order of observations in a data frame from which distances
are generated. It passes this info to fullmatch()
, which makes
sure to return the vector a vector with the ordering of the generating
data frame. It's still possible to create matrices of distances without
this function, but then fullmatch()
has no way of knowing the
order of observations in whatever data frame you're working from, since
that info is lost in the transition to a distance matrix.
Ben Hansen
fullmatch
, pscore.dist
, mahal.dist
data(nuclearplants) ##-- A distance function used in P. Rosenbaum's (2002) book rankdiffs <- function(trtvar, dat, vars) { dmt <- matrix(0,sum(trtvar), sum(!trtvar)) for (vv in vars) { vvr <- rank(dat[names(trtvar),vv]) dmt <- dmt + abs(outer(vvr[trtvar], vvr[!trtvar],"-")) } round(dmt) } ##-- Gives a warning because this fn doesn't assign dimnames (rdd1 <- makedist(pr~1, nuclearplants[nuclearplants$pt==0,], rankdiffs, c("cap","date"))) fullmatch(rdd1) ##-- fullmatch() knows its value should be ordered as the nuclearplants data set is rdd1$m ##-- now fullmatch() doesn't know the proper order of units and has to guess fullmatch(rdd1$m) (rdd2 <- makedist(pr~pt, nuclearplants, rankdiffs, c("cap","date"))) fullmatch(rdd2) ##- Distance on a propensity score scalardiffs <- function(trtvar,data,scalarname) { sclr <- data[names(trtvar), scalarname] names(sclr) <- names(trtvar) abs(outer(sclr[trtvar],sclr[!trtvar], '-')) } nuclearplants$pscore <- glm(pr~.-(pr+cost), family=binomial(), data=nuclearplants)$linear.predictors ##-- Distance for propensity score matching w/o prior stratification psd1 <- makedist(pr~1, nuclearplants, scalardiffs, "pscore") fullmatch(psd1) ##-- Distance for propensity score matching within levels of "pt" psd2 <- makedist(pr~pt, nuclearplants, scalardiffs, "pscore") fullmatch(psd2)