fullmatch {optmatch} | R Documentation |
Given two groups, such as a treatment and a control group, and a treatment-by-control discrepancy matrix indicating desirability and permissibility of potential matches, create optimal full matches of members of the groups. Optionally, incorporate restrictions on matched sets' ratios of treatment to control units.
fullmatch(distance, min.controls = 0, max.controls = Inf, omit.fraction = NULL, tol = 0.001, subclass.indices = NULL)
distance |
A matrix of nonnegative discrepancies,
each indicating the permissibility and desirability of matching the unit
corresponding to its row (a 'treatment') to the unit
corresponding to its column (a 'control'); or, better, a list of such matrices,
as produced by pscore.dist , mahal.dist , or
makedist . |
min.controls |
The minimum ratio of controls to
treatments that is to be permitted within a matched set: should
be nonnegative and finite. If
min.controls is not a whole number, the reciprocal of a whole
number, or zero, then it is rounded down to the nearest
whole number or reciprocal of a whole number.
When matching within subclasses, min.controls may be a named
numeric vector
separately specifying the minimum permissible ratio of controls to
treatments for each subclass. The names of this vector should include
names of all matrices in the list distance .
|
max.controls |
The maximum ratio of controls to treatments that is
to be permitted within a matched set: should be positive and numeric.
If max.controls is not a whole number, the reciprocal of a
whole number, or Inf , then it is rounded up to the
nearest whole number or reciprocal of a whole number.
When matching within subclasses, max.controls may be a named
numeric vector separately specifying the maximum permissible ratio
of controls to treatments in each subclass. |
omit.fraction |
Optionally, specify what fraction of
controls or treated subjects are to be rejected. If
omit.fraction is a positive fraction less than one, then
fullmatch leaves up to that fraction of the control reservoir
unmatched. If omit.fraction is a negative number greater than
-1, then fullmatch leaves up to |omit.fraction | of the
treated group unmatched. Positive values are only accepted if
max.controls >= 1; negative values, only if min.controls
<= 1. If omit.fraction is not specified, then only those
treated and control subjects without permissible matches among the
control and treated subjects, respectively, are omitted.
When matching within subclasses (so that distance is a list of matrices, as produced by
makedist or mahal.dist or pscore.dist ), omit.fraction specifies the fraction of
controls to be rejected in each subproblem, a parameter that can be
made to differ by subclass by setting omit.fraction equal to a
named numeric vector of fractions. |
tol |
Because of internal rounding, fullmatch may
solve a slightly different matching problem than the one
specified, in which the match generated by
fullmatch may not coincide with an optimal solution of
the specified problem. tol times the number of subjects
to be matched specifies the extent to
which fullmatch 's output is permitted to differ from an
optimal solution to the original problem, as measured by the
sum of discrepancies for all treatments and controls placed
into the same matched sets.
|
subclass.indices |
An old argument included for back-compatibility; no longer needed. |
Finite entries in matrix slots of distance
indicate
permissible matches, with smaller
discrepancies indicating more desirable matches.
Matrix distance
must have row and column names.
Consider using makedist
to generate the distances.
fullmatch
tries to guess the
order in which units would have been given in a data frame, and to
order the factor that it returns accordingly. If the dimnames of
distance
, or the matrices it lists, are not simply row numbers
of the data frame you're working with, then you should compare the
names of fullmatch's output to your row names in order to be sure
things are in the proper order. You can relieve yourself of these
worries by using makedist
, pscore.dist
,
or mahal.dist
to produce the distances, as
it passes the ordering of units to fullmatch
, which then uses
it to order its outputs.
The value of tol
can have a substantial effect on
computation time; with smaller values, computation takes longer.
Not every tolerance can be met, and how small a tolerance is too small
varies with the machine and with the details of the problem. If
fullmatch
can't guarantee that the tolerance is as small as the
given value of argument tol
, then matching proceeds but a
warning is issued.
Primarily, a named vector of class c('optmatch',
'factor')
. Elements of this vector correspond to members of the
treatment and control groups in reference to which the matching
problem was posed, and are named accordingly; the names are taken from
the row and column names of distance
. Each element of
the vector is either NA
, indicating unavailability of any
suitable matches for that element, or the concatenation of: (i) a character abbreviation of
the name of the subclass, if matching within subclasses, or the string
'm
' if not; (ii) the string .
; and (iii) a
nonnegative integer or the string NA
. In this last place,
positive whole numbers indicate placement of the unit into a matched
set and NA
indicates that all or part of the matching problem given to
fullmatch
was found to be infeasible. The functions
matched
, unmatched
, and
matchfailed
distinguish these scenarios.
Secondarily, fullmatch
returns various data about the matching
process and its result, stored as attributes of the named vector
which is its primary output. In particular, the exceedances
attribute gives upper bounds, not necessarily sharp, for the amount by
which the sum of distances between matched units in the result of
fullmatch
exceeds the least possible sum of distances between
matched units in a feasible solution to the matching problem given to
fullmatch
. Such a bound is also printed by
print.optmatch
.
Ben Hansen
Hansen, B.B. and Klopfer, S.O. (2006), ‘ Optimal full matching and related designs via network flows’, Journal of Computational and Graphical Statistics, 15, 609–627.
Hansen, B.B. (2004), ‘Full Matching in an Observational Study of Coaching for the {SAT}’, Journal of the American Statistical Association, 99, 609–618.
Rosenbaum, P. (1991), ‘A Characterization of Optimal Designs for Observational Studies’, Journal of the Royal Statistical Society, Series B, 53, 597–610.
matched
, pscore.dist
,
mahal.dist
, makedist
data(plantdist) round(plantdist) plantsfm <- fullmatch(plantdist) # A full match with unrestricted # treatment-control balance PR <- logical(26) PR[match(dimnames(plantdist)[[1]], names(plantsfm))] <- TRUE table(plantsfm, # treatment-control balance, ifelse(PR,'treated', 'control')) # by matched set stratumStructure(plantsfm) # summary of sets' trt-ctl balance sum(plantdist* # sum of matched distances outer(plantsfm[PR], plantsfm[!PR], "==") ) plantsfm1 <- fullmatch(plantdist, # A full match with min.controls=2, max.controls=3) # restrictions on matched sets' # treatment-control balance stratumStructure(plantsfm1) # treatment-control balance is # improved by restrictions