fullmatch {optmatch}R Documentation

Optimal full matching

Description

Given two groups, such as a treatment and a control group, and a treatment-by-control discrepancy matrix indicating desirability and permissibility of potential matches, create optimal full matches of members of the groups. Optionally, incorporate restrictions on matched sets' ratios of treatment to control units.

Usage

fullmatch(distance, min.controls = 0, max.controls = Inf, 
omit.fraction = NULL, tol = 0.001, subclass.indices = NULL)

Arguments

distance A matrix of nonnegative discrepancies, each indicating the permissibility and desirability of matching the unit corresponding to its row (a 'treatment') to the unit corresponding to its column (a 'control'); or, better, a list of such matrices, as produced by pscore.dist, mahal.dist, or makedist.
min.controls The minimum ratio of controls to treatments that is to be permitted within a matched set: should be nonnegative and finite. If min.controls is not a whole number, the reciprocal of a whole number, or zero, then it is rounded down to the nearest whole number or reciprocal of a whole number.
When matching within subclasses, min.controls may be a named numeric vector separately specifying the minimum permissible ratio of controls to treatments for each subclass. The names of this vector should include names of all matrices in the list distance.
max.controls The maximum ratio of controls to treatments that is to be permitted within a matched set: should be positive and numeric. If max.controls is not a whole number, the reciprocal of a whole number, or Inf, then it is rounded up to the nearest whole number or reciprocal of a whole number.
When matching within subclasses, max.controls may be a named numeric vector separately specifying the maximum permissible ratio of controls to treatments in each subclass.
omit.fraction Optionally, specify what fraction of controls or treated subjects are to be rejected. If omit.fraction is a positive fraction less than one, then fullmatch leaves up to that fraction of the control reservoir unmatched. If omit.fraction is a negative number greater than -1, then fullmatch leaves up to |omit.fraction| of the treated group unmatched. Positive values are only accepted if max.controls >= 1; negative values, only if min.controls <= 1. If omit.fraction is not specified, then only those treated and control subjects without permissible matches among the control and treated subjects, respectively, are omitted.
When matching within subclasses (so that distance is a list of matrices, as produced by makedist or mahal.dist or pscore.dist), omit.fraction specifies the fraction of controls to be rejected in each subproblem, a parameter that can be made to differ by subclass by setting omit.fraction equal to a named numeric vector of fractions.
tol Because of internal rounding, fullmatch may solve a slightly different matching problem than the one specified, in which the match generated by fullmatch may not coincide with an optimal solution of the specified problem. tol times the number of subjects to be matched specifies the extent to which fullmatch's output is permitted to differ from an optimal solution to the original problem, as measured by the sum of discrepancies for all treatments and controls placed into the same matched sets.
subclass.indices An old argument included for back-compatibility; no longer needed.

Details

Finite entries in matrix slots of distance indicate permissible matches, with smaller discrepancies indicating more desirable matches. Matrix distance must have row and column names.

Consider using makedist to generate the distances. fullmatch tries to guess the order in which units would have been given in a data frame, and to order the factor that it returns accordingly. If the dimnames of distance, or the matrices it lists, are not simply row numbers of the data frame you're working with, then you should compare the names of fullmatch's output to your row names in order to be sure things are in the proper order. You can relieve yourself of these worries by using makedist, pscore.dist, or mahal.dist to produce the distances, as it passes the ordering of units to fullmatch, which then uses it to order its outputs.

The value of tol can have a substantial effect on computation time; with smaller values, computation takes longer. Not every tolerance can be met, and how small a tolerance is too small varies with the machine and with the details of the problem. If fullmatch can't guarantee that the tolerance is as small as the given value of argument tol, then matching proceeds but a warning is issued.

Value

Primarily, a named vector of class c('optmatch', 'factor'). Elements of this vector correspond to members of the treatment and control groups in reference to which the matching problem was posed, and are named accordingly; the names are taken from the row and column names of distance. Each element of the vector is either NA, indicating unavailability of any suitable matches for that element, or the concatenation of: (i) a character abbreviation of the name of the subclass, if matching within subclasses, or the string 'm' if not; (ii) the string .; and (iii) a nonnegative integer or the string NA. In this last place, positive whole numbers indicate placement of the unit into a matched set and NA indicates that all or part of the matching problem given to fullmatch was found to be infeasible. The functions matched, unmatched, and matchfailed distinguish these scenarios.
Secondarily, fullmatch returns various data about the matching process and its result, stored as attributes of the named vector which is its primary output. In particular, the exceedances attribute gives upper bounds, not necessarily sharp, for the amount by which the sum of distances between matched units in the result of fullmatch exceeds the least possible sum of distances between matched units in a feasible solution to the matching problem given to fullmatch. Such a bound is also printed by print.optmatch.

Author(s)

Ben Hansen

References

Hansen, B.B. and Klopfer, S.O. (2006), ‘ Optimal full matching and related designs via network flows’, Journal of Computational and Graphical Statistics, 15, 609–627.

Hansen, B.B. (2004), ‘Full Matching in an Observational Study of Coaching for the {SAT}’, Journal of the American Statistical Association, 99, 609–618.

Rosenbaum, P. (1991), ‘A Characterization of Optimal Designs for Observational Studies’, Journal of the Royal Statistical Society, Series B, 53, 597–610.

See Also

matched, pscore.dist, mahal.dist, makedist

Examples

data(plantdist)
round(plantdist)
plantsfm <- fullmatch(plantdist) # A full match with unrestricted
                                 # treatment-control balance
PR <- logical(26)
PR[match(dimnames(plantdist)[[1]], names(plantsfm))] <- TRUE

table(plantsfm,                        # treatment-control balance, 
ifelse(PR,'treated', 'control'))       # by matched set

stratumStructure(plantsfm)             # summary of sets' trt-ctl balance

sum(plantdist*                         # sum of matched distances
    outer(plantsfm[PR], plantsfm[!PR],
          "==")   )

plantsfm1 <- fullmatch(plantdist, # A full match with 
min.controls=2, max.controls=3)   # restrictions on matched sets' 
                                  # treatment-control balance

stratumStructure(plantsfm1)       # treatment-control balance is 
                                  # improved by restrictions


[Package optmatch version 0.4-1 Index]