HDPdensity {DPpackage} | R Documentation |
This function generates a posterior density sample for a DP mixture of normals model for related random probability measures. Support provided by the NIH/NCI R01CA75981 grant.
HDPdensity(formula,study,prior,mcmc,state,status,data=sys.frame(sys.parent()),na.action=na.fail, work.dir=NULL)
formula |
a two-sided linear formula object describing the
model fit, with the response on the
left of a ~ operator and the terms, separated by +
operators, on the right. The design matrix is used to model
the distribution of the responses in the HDP mixture of normals model. |
study |
a (1 by n ) vector of study indicators. The i-th index is the study j
that response i belongs to. |
prior |
a list giving the prior information. The list includes the following
parameters: pe1 and pe0 giving the prior weights for the point mass at
ε=1 and at ε=1, respectively, ae and be
giving the prior parameters for a Beta prior on ε, eps giving
the value of ε (it must be specified if pe1 is missing),
a0 and b0 giving the hyperparameters for
prior distribution of the precision parameter of the Dirichlet process
prior, alpha giving the value of the precision parameter (it
must be specified if a0 is missing), a and A
giving the hyperparameters of the normal prior distribution
for the mean of the normal baseline distribution, m giving the mean
of the normal baseline distribution (is must be specified if a is missing),
cc and C giving the hyperparameters of the
Wishart prior distribution for the inverse of the scale matrix of the normal
baseline distribution, B giving the covariance matrix of the normal
baseline distribution (is must be specified if cc is missing),
q and R giving the hyperparameters of the
Wishart prior distribution for the inverse of the scale matrix of the normal
kernel, and S giving the covariance matrix of the normal
kernal (is must be specified if q is missing). |
mcmc |
a list giving the MCMC parameters. The list must include
the following integers: nburn giving the number of burn-in
scans, nskip giving the thinning interval, nsave giving
the total number of scans to be saved, ndisplay giving
the number of saved scans to be displayed on screen. |
state |
a list giving the current value of the parameters. This list is used if the current analysis is the continuation of a previous analysis (not available yet). |
status |
a logical variable indicating whether this run is new (TRUE ) or the
continuation of a previous analysis (FALSE ). In the latter case
the current value of the parameters must be specified in the
object state (not available yet). |
data |
data frame. |
na.action |
a function that indicates what should happen when the data
contain NA s. The default action (na.fail ) causes
HDPdensity to print an error message and terminate if there are any
incomplete observations. |
work.dir |
working directory. |
The function sets up and carries out posterior Markov chain Monte Carlo (MCMC) simulation for a hierarchical DP mixture model.
The model is a DP mixture of normals for related random probability measures H_j. Each random measure is assumed to arise as a mixture H_j = ε F_0 + (1-ε) F_j of one common distribution F_0 and a distribution F_j that is specific to the j-th submodel.
See Mueller, Quintana and Rosner (2004) for details of the model. In summary, the implemented model is as follows. Without loss of generality we assume that each submodel corresponds to a different study in a set of related studies. Let theta_{ij} denote the i-th observation in the j-th study (we use theta, assuming that the model would typically be used for a random effects distribution). We assume that theta_{ji}, i=1,...,n_j are samples from a random probability measure for the j-th study, which in turn is a mixture of a measure F_0 that is common to all studies, and an idiosyncratic measure F_j that is specific to the j-th study.
theta_{ji} sim ε F_0 + (1-ε) F_j
The random probability measures F_j in turn are given a Dirichlet process mixture of normal prior. We assume
F_j(theta) = int N(μ,S) dG_j(μ),~ j=0,1,...,J
with G_j sim DP(G^star(eta),α). Here eta are hyperparameters that index the base measure of the DP prior. We use a normal base measure and a conjugate hyperprior
G^star(μ) = N(m,B), mbox{ with } m sim N(a,A), mbox{ and } B^{-1}sim Wishart(cc,(ccC)^{-1})
The Wishart prior is parametrized such that E(B^{-1}=C^{-1}). Let delta_x denote a point mass at x. We complete the model with the hyperpriors
S^{-1} sim W(q,(qR)^{-1}),~ p(ε) = π_0delta_0+π_1delta_1+(1-π_0-π_1) Be(a_ε,b_ε)
Regression on observation-specific covariates x_{ji} can be achieved by including x_{ji} with the outcome theta_{ji}, and proceeding as if (x_{ji},theta_{ji}) were generated as theta_{ji} in the model described above. See Mueller et al. (2004, section 3.3) for details.
The function returns no value. MCMC simulations
are saved in files in the designated working directory.
Use predict.HDPdensity
to plot summaries.
Peter Mueller <pmueller@mdanderson.org>
Mueller, P., Quintana, F. and Rosner, G. (2004). A Method for Combining Inference over Related Nonparametric Bayesian Models. Journal of the Royal Statistical Society, Series B, 66: 735-749.
## Not run: # Data data(calgb) # Prior information Z <- calgb[,1:10] mhat <- apply(Z,2,mean) v <- diag(var(Z)) prior<-list(a0=1, b0=1, pe1=0.1, pe0=0.1, ae=1, be=1, a=mhat, A=diag(v), q=15, R=0.25*diag(v), cc=15, C=diag(v)) # Initial state state <- NULL # MCMC parameters mcmc <- list(nburn=1000, nsave=2000, nskip=0, ndisplay=100, npredupdate=100) # Fitting the model fit1 <- HDPdensity(formula=cbind(Z1,Z2,Z3,T1,T2,B0,B1)~CTX+GM+AMOF, study=~study, prior=prior, mcmc=mcmc, state=state, data=calgb, status=TRUE) # Load data for future patients (for prediction) data(calgb.pred) X <- calgb.pred # post-process MCMC output for predictive inference # save posterior predictive simulations in z00 ... z30 z10 <- predict(fit1,data.pred=X,j=1,r=0) # post prediction for study 1 z20 <- predict(fit1,data.pred=X,j=2,r=0) # .. study 2 z30 <- predict(fit1,data.pred=X,j=3,r=0) # .. population at large (= study 3) z11 <- predict(fit1,data.pred=X,j=1,r=1) # idiosyncratic measures study 1 z21 <- predict(fit1,data.pred=X,j=2,r=1) # .. study 2 z00 <- predict(fit1,data.pred=X,j=0,r=0) # common measure # covariates (and dummy responses) of future patients colnames(z00) <- c("PATIENT",colnames(X)) # plot estimated density for future patients in study 1, 2 and # in population at large idx <- which(z10[,1]==1) ## PATIENT 1 options(digits=2) par(mfrow=c(2,1)) # plot prediction fo study 1,2,population plot (density(z10[idx,8]), ylim=c(0,1.5),xlim=c(-0.5,2.5), xlab="SLOPE OF RECOVERY",bty="l",main="FUTURE PAT 1") lines (density(z20[idx,8]),type="l",col=2) lines (density(z30[idx,8]),type="l",col=3) legend(-0.5,1.5,col=1:3,legend=c("STUDY 1","STUDY 2","POPULATION"), lty=c(1,1,1),bty="n") # common and idiosyncratic measures plot (density(z00[idx,8]),type="l",col=4,lty=1, ylim=c(0,1.5),xlim=c(-0.5,2.5), xlab="SLOPE OF RECOVERY",bty="l",main="COMMON & IDIOSYNC PARTS") lines (density(z11[idx,8]),type="l",col=1,lty=2) lines (density(z21[idx,8]),type="l",col=2,lty=2) legend(1.5,1.5,col=c(1,2,4),lty=c(2,2,1), legend=c("STUDY 1 (idiosyn.)", "STUDY 2 (idiosyn.)", "COMMON"),bty="n") # plot estimated density for future patients in study 1, 2 and # in population at large idx <- which(z10[,1]==2) ## PATIENT 2 options(digits=2) par(mfrow=c(2,1)) plot (density(z10[idx,8]), ylim=c(0,1.5),xlim=c(-0.5,2.5), xlab="SLOPE OF RECOVERY",bty="l",main="FUTURE PAT 2") lines (density(z20[idx,8]),type="l",col=2) lines (density(z30[idx,8]),type="l",col=3) legend(-0.5,1.5,col=1:3,legend=c("STUDY 1","STUDY 2","POPULATION"), lty=c(1,1,1),bty="n") plot (density(z00[idx,8]),type="l",col=4,lty=1, ylim=c(0,1.5),xlim=c(-0.5,2.5), xlab="SLOPE OF RECOVERY",bty="l",main="COMMON & IDIOSYNC PARTS") lines (density(z11[idx,8]),type="l",col=1,lty=2) lines (density(z21[idx,8]),type="l",col=2,lty=2) legend(1.5,1.5,col=c(1,2,4),lty=c(2,2,1), legend=c("STUDY 1 (idiosyn.)", "STUDY 2 (idiosyn.)", "COMMON"),bty="n") # plot nadir count by covariate, for population z2 <- z30[,3]; ctx <- z30[,9]; gm <- z30[,10]; amf <- z30[,11] # fix covariates gm (GM-CSF) and amf (aminofostine) idx <- which( (gm==-1.78) & (amf== -0.36) ) boxplot(split(z2,ctx), xlab="CYCLOPHOSPHAMIDE",bty="n",ylab="NADIR COUNT") ## End(Not run)