block {blockTools} | R Documentation |
Block units into experimental blocks, with one unit per treatment condition. Blocking begins by creating a measure of multivariate distance between all possible pairs of units. Maximum, minimum, or an allowable range of differences between units on one variable can be set.
block(data, vcov.data = NULL, groups = NULL, n.tr = 2, id.vars, block.vars = NULL, algorithm = "optGreedy", distance = "mahalanobis", row.sort = NULL, level.two = FALSE, valid.var = NULL, valid.range = NULL, seed, verbose = FALSE, ...)
data |
a dataframe or matrix, with units in rows and variables in columns. |
vcov.data |
an optional matrix of data used to estimate the variance-covariance matrix for calculating multivariate distance. |
groups |
an optional column name from data , specifying subgroups
within which blocking occurs. |
n.tr |
the number of treatment conditions per block. |
id.vars |
a required string or vector of two strings specifying which
column(s) of data contain identifying information. |
block.vars |
an optional string or vector of strings specifying which
column(s) of data contain the blocking variables. |
algorithm |
a string specifying the blocking algorithm.
"optGreedy" , "naiveGreedy" , "randGreedy" , and
"sortGreedy" algorithms are currently available. See Details
for more information. |
distance |
either a) a string defining how the multivariate
distance used for blocking is calculated (options include
"mahalanobis" , "mcd" , and "mve" ), or b) a
user-defined $ktimes k$ matrix, where $k$ is the number of rows in
data . |
row.sort |
an optional vector of integers from 1 to
nrow(data) used to sort the rows of data when
algorithm = sortGreedy . |
level.two |
a logical defining the level of blocking. |
valid.var |
an optional string defining a variable on which
units in the same block must fall within the range defined by
valid.range . |
valid.range |
an optional vector defining the range of
valid.var within which units in the same block must fall. |
seed |
an optional integer value for the random seed set in
cov.rob , used to calculate measures of the
variance-covariance matrix robust to outliers. |
verbose |
a logical specifying whether groups names and
block numbers are printed as blocks are created. |
... |
additional arguments passed to cov.rob . |
If vcov.data = NULL
, then block
calculates the
variance-covariance matrix using the block.vars
from
data
.
If groups
is not user-specified, block
temporarily creates
a variable in data
called "groups"
, which takes the value
1 for every unit.
Where possible, one unit is assigned to each condition in each block. If there are fewer available units than treatment conditions, available units are used.
If n.tr
$> 2$, then the optGreedy
algorithm finds the best
possible pair match, then the best match to either member of the pair,
then the best match to any member of the triple, .... Other
algorithms proceed similarly.
An example of id.vars
is id.vars = c("id", "id2")
. If
two-level blocking is selected, id.vars
should be ordered ({it
$langle$unit id$rangle$, $langle$subunit id$rangle$}). See
details for level.two
below for more information.
If block.vars = NULL
, then all variables in data
except
the id.vars
are taken as blocking variables. E.g.,
block.vars = c("b1", "b2")
.
"optGreedy"
calls an optimal-greedy algorithm, sequentially
finding the best match in the entire dataset; "naiveGreedy"
finds
the best match proceeding down the dataset from the first unit to the
last; "randGreedy"
randomly selects a unit, finds its best match,
and repeats; "sortGreedy"
resorts the dataset according to
row.sort
, then implements a naiveGreedy
algorithm.
The optGreedy
algorithm breaks ties by randomly selecting one of
the minimum-distance pairs. The naiveGreedy
, sortGreedy
,
and randGreedy
algorithms break ties by randomly selecting one of
the minimum-distance matches to the particular unit in question.
The distance = "mcd"
and distance = "mve"
options call
cov.rob
to calculate measures of multivariate spread robust to
outliers. The distance = "mcd"
option calculates the Minimum
Covariance Determinant estimate; the distance = "mve"
option
calculates the Minimum Volume Ellipsoid estimate.
A user-specified distance matrix must have diagonals equal to 0, indicating zero distance between a unit and itself. Only the lower triangle of the matrix is used.
If level.two = TRUE
, then the best subunit block-matches in
different units are found. E.g., provinces could be matched based on
the most similar cities within them. All subunits in the data should
have unique names. Thus, if subunits are numbered 1 to {it $langle$
number of subunits in unit$rangle$} within each unit, then they should
be renumbered, e.g., 1 to {it $langle$ total number of
subunits in all units$rangle$}.
An example of a variable restriction is valid.var="b2"
,
valid.range = c(10,50)
, which requires that units in the same
block be at least 10 units apart, but no more than 50 units apart, on
variable {tt "b2"}.
A list with elements
blocks |
a list of dataframes, each containing a group's blocked units. If there are two treatment conditions, then the last column of each dataframe displays the multivariate distance between the two units. If there are more than two treatment conditions, then the last column of each dataframe displays the largest of the multivariate distances between all possible pairs in the block. |
level.two |
a logical indicating whether level.two =
TRUE . |
call |
the orginal call to block . |
Ryan T. Moore
King, Gary, Emmanuela Gakidou, Nirmala Ravishankar, Ryan T. Moore, Jason Lakin, Manett Vargas, Martha Mar'ia T'ellez-Rojo and Juan Eugenio Hern'andez 'Avila and Mauricio Hern'andez 'Avila and H'ector Hern'andez Llamas. 2007. "A 'Politically Robust' Experimental Design for Public Policy Evaluation, with Application to the Mexican Universal Health Insurance Program". Journal of Policy Analysis and Management 26(3): 479-509.
data(x100) out <- block(x100, groups = "g", n.tr = 2, id.vars = c("id"), block.vars = c("b1", "b2"), algorithm="optGreedy", distance = "mahalanobis", level.two = FALSE, valid.var = "b1", valid.range = c(0,500), verbose = TRUE) ## out$blocks contains 3 data frames ## To illustrate two-level blocking, with multiple level two units per ## level one unit: for(i in (1:nrow(x100))){if(even(i)){x100$id[i] <- x100$id[i-1]}} out <- block(x100, groups = "g", n.tr = 2, id.vars = c("id", "id2"), block.vars = c("b1", "b2"), algorithm="optGreedy", distance = "mahalanobis", level.two = TRUE, valid.var = "b1", valid.range = c(0,500), verbose = TRUE)