fullmatch {optmatch}R Documentation

Optimal full matching

Description

Given two groups, such as a treatment and a control group, and a treatment-by-control discrepancy matrix indicating desirability and permissibility of potential matches, create optimal full matches of members of the groups. Optionally, incorporate restrictions on matched sets' ratios of treatment to control units.

Usage

fullmatch(distance, subclass.indices = NULL, 
min.controls = 0, max.controls = Inf, 
omit.fraction = NULL, tol = 0.001)

Arguments

distance A matrix of nonnegative discrepancies, each indicating the permissibility and desirability of matching the unit corresponding to its row (a 'treatment') to the unit corresponding to its column (a 'control'); or a list of such matrices. Finite discrepancies indicate permissible matches, with smaller discrepancies indicating more desirable matches. Matrix distance, or the matrix elements of distance, must have row and column names.
min.controls The minimum ratio of controls to treatments that is to be permitted within a matched set: should be nonnegative and finite. If min.controls is not a whole number, the reciprocal of a whole number, or zero, then it is rounded down to the nearest whole number or reciprocal of a whole number.
When distance is a list of matrices (or subclass.indices is given), min.controls may be a named numeric vector separately specifying the minimum permissible ratio of controls to treatments for each subclass. The names of this vector should include names of all matrices in the list distance.
max.controls The maximum ratio of controls to treatments that is to be permitted within a matched set: should be positive and numeric. If max.controls is not a whole number, the reciprocal of a whole number, or Inf, then it is rounded up to the nearest whole number or reciprocal of a whole number.
When distance is a list of matrices (or subclass.indices is given), max.controls may be a named numeric vector separately specifying the maximum permissible ratio of controls to treatments in each subclass.
omit.fraction Optionally, specify what fraction of controls or treated subjects are to be rejected. If omit.fraction is a positive fraction less than one, then fullmatch leaves up to that fraction of the control reservoir unmatched. If omit.fraction is a negative number greater than -1, then fullmatch leaves up to |omit.fraction| of the treated group unmatched. Positive values are only accepted if max.controls >= 1; negative values, only if min.controls <= 1. If omit.fraction is not specified, then only those treated and control subjects without permissible matches among the control and treated subjects, respectively, are omitted.
When distance is a list of matrices (or subclass.indices has been given), omit.fraction specifies the fraction of controls to be rejected in each subproblem, a parameter that can be made to differ by subclass by setting omit.fraction equal to a named numeric vector of fractions.
tol Because of internal rounding, fullmatch may solve a slightly different matching problem than the one specified, in which the match generated by fullmatch may not coincide with an optimal solution of the specified problem. tol times the number of subjects to be matched specifies the extent to which fullmatch's output is permitted to differ from an optimal solution to the original problem, as measured by the sum of discrepancies for all treatments and controls placed into the same matched sets.
subclass.indices An optional factor or data frame indicating subclasses within which matched sets are to be wholly contained. If subclass.indices is provided, it must have names corresponding to the row and column names of distance. Unless distance is a matrix, subclass.indices will be ignored. Note: this argument is being phased out; consider splitting up your matching problem and supplying it to fullmatch as separate discrepancy matrices; see distance.

Details

If distance is a list of matrices, each matrix is treated as a separate matching problem unto itself. In this case, one has to give names to the matrices in order to specify arguments min.controls or max.controls.

fullmatch tries to guess the order in which units would have been given in a data frame, and to order the factor that it returns accordingly. If the dimnames of distance, or the matrices it lists, are not simply row numbers of the data frame you're working with, then you should compare the names of fullmatch's output to your row names in order to be sure things are in the proper order.

The value of tol can have a substantial effect on computation time; with smaller values, computation takes longer.

Not every tolerance can be met, and how small a tolerance is too small varies with the machine and with the details of the problem. If fullmatch can't guarantee that the tolerance is as small as the given value of argument tol, then matching proceeds but a warning is issued.

Value

Primarily, a named vector of class c('optmatch', 'factor'). Elements of this vector correspond to members of the treatment and control groups in reference to which the matching problem was posed, and are named accordingly; the names are taken from the row and column names of distance. Each element of the vector is the concatenation of: (i) a character abbreviation of subclass.indices, if that argument was given, or the string 'm' if it was not; (ii) the string .; and (iii) a nonnegative integer or the string NA. In this last place, positive whole numbers indicate placement of the unit into a matched set, zero indicates a unit that was not matched, and NA indicates that all or part of the matching problem given to fullmatch was found to be infeasible.
In some cases, only proper subsets of the initial treatment and/or control groups will be represented in the value of fullmatch. Whether this occurs is determined by the status of argument subclass.indices. If subclass.indices is null, then all elements of the treatment and control groups, i.e. the rows and columns of distance, are represented in the value of fullmatch. Otherwise, the vector has an element for each unit represented in subclass.indices, i.e. for each element of factor subclass.indices or for each row of data frame subclass.indices.
Secondarily, fullmatch returns various data about the matching process and its result, stored as attributes of the named vector which is its primary output. In particular, the exceedances attribute gives upper bounds, not necessarily sharp, for the amount by which the sum of distances between matched units in the result of fullmatch exceeds the least possible sum of distances between matched units in a feasible solution to the matching problem given to fullmatch. Such a bound is also printed by print.optmatch.

Note

fullmatch is based on an algorithm developed by Stephanie Olsen Klopfer. Her algorithm translates full matching problems into network flow problems; in the present implementation, the latter are handled by Bertsekas and Tseng's RELAX-IV codes.

Author(s)

Ben Hansen

References

Hansen, B.B. (2004), ‘Full Matching in an Observational Study of Coaching for the {SAT}’, Journal of the American Statistical Association, 99, 609–618. (Cf. especially the Appendix.)

Rosenbaum, P. (1991), ‘A Characterization of Optimal Designs for Observational Studies’, Journal of the Royal Statistical Society, Series B, 53, 597–610.

See Also

matched

Examples

plantdist <- matrix(nrow=7, ncol=19,byrow=TRUE,data=c(
28, 0, 3,22,14,30,17,28,26,28,20,22,23,26,21,18,34,40,28,
24, 3, 0,22,10,27,14,26,24,24,16,19,20,23,18,16,31,37,25,
10,18,14,18, 4,12, 6,11, 9,10,14,12, 6,14,22,10,16,22,28,
 7,28,24, 8,14, 2,10, 6,12, 0,24,22, 4,24,32,20,18,16,38,
17,20,16,32,18,26,20,18,12,24, 0, 2,20, 6, 8, 4,14,20,14,
20,31,28,35,20,29,22,20,14,26,12, 9,22, 5,15,12, 9,11,12,
14,32,29,30,18,24,17,16,10,22,12,10,17, 6,16,14, 4, 8,17),
dimnames=list(c("A","B","C","D","E","F","G"),
c("H","I","J","K","L","M","N","O","P","Q","R",
"S","T","U","V","W","X","Y","Z")))

plantsfm <- fullmatch(plantdist) # A full match with unrestricted
                                 # treatment-control balance
pr <- logical(26)
pr[match(dimnames(plantdist)[[1]], names(plantsfm))] <- TRUE

table(plantsfm,                        # treatment-control balance, 
ifelse(pr,'treated', 'control'))       # by matched set

tapply(names(plantsfm),                # largest treatment-control 
plantsfm, FUN= function(x, dmat) {     # distances, by matched set
max(
    dmat[match(x, dimnames(dmat)[[1]]), 
         match(x, dimnames(dmat)[[2]])], 
    na.rm=TRUE )
}, dmat=plantdist)

plantsfm1 <- fullmatch(plantdist, # A full match with 
min.controls=2, max.controls=3)   # restrictions on matched sets' 
                                  # treatment-control balance

table(plantsfm1,                  # treatment-control balance is 
ifelse(pr,'treated','control'))   # improved by restrictions

tapply(names(plantsfm1),                # but distances between
plantsfm1, FUN= function(x, dmat) {     # matched units increase
max(                                    # slightly
    dmat[match(x, dimnames(dmat)[[1]]), 
         match(x, dimnames(dmat)[[2]])], 
    na.rm=TRUE )
}, dmat=plantdist)

[Package optmatch version 0.1-7 Index]