ps {twang} | R Documentation |
ps
calculates propensity scores and diagnoses them using
a variety of methods, but centered on using boosted logistic regression as
implemented in gbm
ps(formula = formula(data), data, sampw = rep(1, nrow(data)), title=NULL, stop.method = stop.methods[1:2], plots="all", pdf.plots=FALSE, n.trees = 10000, interaction.depth = 3, shrinkage = 0.01, perm.test.iters=0, print.level = 2, iterlim = 1000, verbose = TRUE)
formula |
a formula for the propensity score model with the treatment indicator on the left side of the formula and the potential confounding variables on the right side. |
title |
a short text title, it will be used in plots and saved files |
data |
the dataset, includes treatment assignment as well as covariates |
sampw |
optional sampling weights |
stop.method |
a stop.methods object, or a list of such
objects, containing the metrics and rules for evaluating
the quality of the propensity scores |
plots |
a character vector indicating which plots to create. The options
are all (the default), optimize, ps boxplot, weight histogram,
t pvalues, ks pvalues, es. Any other options (such as "none") will
produce no plots. See the help for diag.plot for details
on the plotted figures |
pdf.plots |
if TRUE then all plots are dumped to a pdf file with
the name specified in title |
n.trees |
number of gbm iterations passed on to gbm |
interaction.depth |
interaction.depth passed on to
gbm |
shrinkage |
shrinkage passed on to gbm |
perm.test.iters |
a non-negative integer giving the number of iterations
of the permutation test for the KS statistic. If perm.test.iters=0
then the function returns an analytic approximation to the p-value. Setting
perm.test.iters=200 will yield precision to within 3% if the true
p-value is 0.05. Use perm.test.iters=500 to be within 2% |
print.level |
the amount of detail to print to the screen |
iterlim |
maximum number of iterations for the direct optimization |
verbose |
if TRUE, lots of information will be printed to monitor the the progress of the fitting |
formula
should be something like "treatment ~ X1 + X2 + X3". The
treatment variable should be a 0/1 indicator. There is no need to specify
interaction terms in the formula. interaction.depth
controls the level
of interactions to allow in the propensity score model.
If pdf.plots=TRUE
then ps
causes plots to be saved as a single
pdf file with the name "[title].pdf" in the working directory. See
diag.plot
for details of the plots.
Returns an object of class ps
, a list containing
gbm.obj |
The returned gbm object |
ps |
a data frame containing the estimated propensity scores. Each
column is associated with one of the methods selected in
stop.methods |
w |
a data frame containing the propensity score weights. Each
column is associated with one of the methods selected in
stop.methods . If sampling weights were given then these are
incorporated into these weights |
plot.info |
a list containing the raw data used to generate the plots |
desc |
a list containing balance tables for each method selected in
stop.methods . Includes a component for the unweighted
analysis names “unw”. Each desc component includes
a list with the following components
|
datestamp |
Records the date of the analysis |
parameters |
Saves the ps call |
alerts |
Text containing any warnings accumulated during the estimation |
Greg Ridgeway gregr@rand.org, Dan McCaffrey danielm@rand.org, Andrew Morral morral@rand.org
Dan McCaffrey, G. Ridgeway, Andrew Morral (2004). “Propensity Score Estimation with Boosted Regression for Evaluating Adolescent Substance Abuse Treatment,” Psychological Methods 9(4):403-425.
data(lalonde) print(nrow(lalonde)) ps.lalonde <- ps(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = lalonde, title="Lalonde example", stop.method=stop.methods[c("ks.stat.mean","ks.stat.max")], # generate plots? plots="all", pdf.plots=FALSE, # gbm options n.trees=2000, interaction.depth=3, shrinkage=0.005, perm.test.iters=0, verbose=TRUE) # get the balance tables bal.table(ps.lalonde) # diagnose the weights using a ps object a <- dx.wts(ps.lalonde,data=lalonde,treat.var="treat") print(a) bal.table(a) # diagnose the weights as propensity score weights # will be the same as before, except for MC variation in the KS p-values # when perm.test.iters is greater than 0 w <- with(ps.lalonde, ps/(1-ps)) w[lalonde$treat==1,] <- 1 dx.wts(w,data=lalonde,treat.var="treat", perm.test.iters=0) # diagnose the weights as propensity scores p <- ps.lalonde$ps dx.wts(p,data=lalonde,treat.var="treat",x.as.weights=FALSE) # look at propensity scores names(ps.lalonde$ps) hist(ps.lalonde$ps$ks.stat.max) boxplot(split(ps.lalonde$ps$ks.stat.max,ps.lalonde$treat), ylab="estimated propensity scores", names=c("control","treatment")) # check out the balance names(ps.lalonde$desc) # unweighted ps.lalonde$desc$unw # optimized for ks.stat.max ps.lalonde$desc$ks.stat.max # check out the gbm object, indicates which variables are most influential in # estimating the propensity score summary(ps.lalonde$gbm.obj, n.trees=ps.lalonde$desc$ks.stat.max$n.trees) # bal.stat() can use an arbitrary set of weights bal.stat(data=lalonde, w.all=w[,1], vars=names(lalonde), treat.var="treat", get.means=TRUE, get.ks=TRUE, na.action="level") # sensitivity analysis sensitivity(ps.lalonde,lalonde,"re78")