xBalance {RItools} | R Documentation |
Given covariates, a treatment variable, and a stratifying factor, calculates standardized differences (biases) along each covariate, with and without the stratification. Also, tests for conditional independence of the treatment variable and the covariates within strata.
xBalance(fmla, strata=NULL, data, report=c("std.diffs","z.scores"), stratum.weights=harmonic, na.rm=FALSE, covariate.scaling=NULL, normalize.weights=TRUE)
fmla |
A formula containing an indicator of treatment assignment on the left hand side and covariates at right. |
strata |
NULL , a factor with length equal to the number of rows
in data, or a data frame of such factors. |
data |
A data frame in which fmla
to be evaluated. |
report |
Character vector listing measures to report for each
stratification; a subset of c("adj.means","adj.mean.diffs",
"chisquare.test","std.diffs","z.scores","p.values") . P-values reported are 2-sided for the null-hypothesis of no effect. |
na.rm |
Whether to remove rows with NAs on any variables mentioned on
the RHS of fmla (i.e. listwise deletion). Defaults to FALSE , wherein rows
aren't deleted but for each variable with NA s a missing-data
indicator variable is added to the variables on which balance is calculated. |
stratum.weights |
Weights to be applied when aggregating across
strata specified by strata , defaulting to weights
proportional to the harmonic mean of treatment and control group
sizes within strata. This can be either a function used to
calculate the weights or the weights themselves; if strata is
a data frame, then it can be such a function, a list of such
functions, or a data frame of stratum weighting schemes
corresponding to the different stratifying factors of strata . See details. |
covariate.scaling |
scale factor to apply to covariates in
calculating std.diffs . If
NULL , xBalance pools standard deviations of each
variable in the treatment and control group (defining these groups
according to whether the LHS of formula is greater than or
equal to 0). Also, see details. |
normalize.weights |
If TRUE , then stratum weights are
normalized so as to sum to 1. Defaults to TRUE . |
In the unstratified case, the standardized difference of
covariate means is the mean in the treatment group minus
the mean in the control group, divided by the sd
in the same variable estimated by pooling treatment and control
group sds on the same variable. In the stratified case, the
denominator of the standardized difference remains the same but
the numerator is a weighted average of within-stratum differences
in means on the covariate. By default, each stratum is weighted in proportion
to the harmonic mean 1/[(1/a + 1/b)/2]=2*a*b/(a+b) of the number of
treated units (a) and control units (b) in the stratum; this weighting is
optimal under certain modeling assumptions (discussed in Kalton 1968,
Hansen and Bowers 2008). This weighting can be modified using the
stratum.weights
argument; see below.
When the treatment variable, the variable specified by the left-hand
side of fmla
, is not binary, xBalance
calculates the
covariates' regressions on the treatment variable, in the stratified
case pooling these regressions across strata using weights that default
to the stratum-wise sum of squared deviations of the treatment variable
from its stratum mean. (Applied to binary treatment variables, this
recipe gives the same result as the one given above.) In the numerator
of the standardized difference, we get a ``pooled sd'' from separating
units into two groups, one in which the treatment variable is 0 or less
and another in which it is positive. If report
includes "adj.means",
covariate means for the former of these groups are reported, along with
the sums of these means and the covariates' regressions on either the treatment
variable, in the unstratified (``pre'') case, or the treatment variable
and the strata, in the stratified (``post'') case.
stratum.weights
can be either a function or a
numeric vector of weights. If it is a numeric vector, it should be
nonnegative and it should have stratum names as its names. (I.e.,
its names should be equal to the levels of the factor specified by strata
.) If it is a function, it should accept one
argument, a data frame containing the variables in data
and
additionally Tx.grp
and stratum.code
, and return a
vector of nonnegative weights with stratum codes as names; for an
example, do getFromNamespace("harmonic",
"RItools")
.
If covariate.scaling
is not NULL
, no scaling is
applied. This behavior is
likely to change in future versions. (If you want no scaling, set
covariate.scaling=1
, as this is likely to retain this meaning
in the future.)
A data frame with as many rows as there were covariates and
levels of covariates in fmla
and columns including some or
all of
XX.difference
, XX.z
, XX.difference
,
XX.z
, XX.p
, XX.Tx.eq.0
, XX.Tx.eq.1
, where XX ranges
over the stratifying variables given in strata
. If
“chisquare.test” is in report
, then the
data frame also has attributes XX.chisquare
and XX.df
. Its class is
c(\dQuote{xbal}, \dQuote{data.frame})
. There are plot and
print methods for class \dQuote{newbal}
; the print method is
demonstrated in the examples.
Evidence pertaining to the hypothesis that treatment variable is not associated with differences in covariate values is assessed by comparing the differences (or regression coefficients), without standardization, to their distributions under hypothetical shuffles of the treatment variable, a permutation or randomization distribution. For the unstratified comparison, this reference distribution consists of differences (more generally, regression coefficients) when the treatment variable is permuted without regard to strata. For the stratified comparison, the reference distribution is determined by randomly permuting the treatment variable within strata, then re-calculating the treatment-control differences (regressions of each covariate on the permuted treatment variable). Significance assessments are based on the large-sample Normal approximation to these reference distributions.
Ben Hansen and Jake Bowers
Hansen, B.B. and Bowers, J. (2008), ``Covariate Balance in Simple, Stratified and Clustered Comparative Studies,'' Statistical Science 23.
Kalton, G. (1968), ``Standardization: A technique to control for extraneous variables,'' Applied Statistics 17, 118–136.
data(nuclearplants) xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n, data=nuclearplants) xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n, data=nuclearplants, report=c("adj.means","adj.mean.diffs","std.diffs", "z.scores", "chisquare.test")) xBalance(pr~.-cost-pt, strata=factor(nuclearplants$pt), data=nuclearplants, report=c("adj.means","adj.mean.diffs","std.diffs", "z.scores", "chisquare.test")) xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n, strata=list(unstrat=NULL, pt=~pt), data=nuclearplants, report=c("adj.means", "chisquare.test")) xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n, strata=data.frame(unstrat=factor('none'), pt=factor(nuclearplants$pt)), data=nuclearplants, report=c("adj.means", "chisquare.test"))