HDPdensity {DPpackage}R Documentation

Bayesian analysis for a Hierarchical Mixture of Dirichlet Process Mixture of Normals

Description

This function generates a posterior density sample for a DP mixture of normals model for related random probability measures. Support provided by the NIH/NCI R01CA75981 grant.

Usage

HDPdensity(formula,study,prior,mcmc,state,status,data=sys.frame(sys.parent()),na.action=na.fail,
           work.dir=NULL)

Arguments

formula a two-sided linear formula object describing the model fit, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. The design matrix is used to model the distribution of the responses in the HDP mixture of normals model.
study a (1 by n) vector of study indicators. The i-th index is the study j that response i belongs to.
prior a list giving the prior information. The list includes the following parameters: pe1 and pe0 giving the prior weights for the point mass at ε=1 and at ε=1, respectively, ae and be giving the prior parameters for a Beta prior on ε, eps giving the value of ε (it must be specified if pe1 is missing), a0 and b0 giving the hyperparameters for prior distribution of the precision parameter of the Dirichlet process prior, alpha giving the value of the precision parameter (it must be specified if a0 is missing), a and A giving the hyperparameters of the normal prior distribution for the mean of the normal baseline distribution, m giving the mean of the normal baseline distribution (is must be specified if a is missing), cc and C giving the hyperparameters of the Wishart prior distribution for the inverse of the scale matrix of the normal baseline distribution, B giving the covariance matrix of the normal baseline distribution (is must be specified if cc is missing), q and R giving the hyperparameters of the Wishart prior distribution for the inverse of the scale matrix of the normal kernel, and S giving the covariance matrix of the normal kernal (is must be specified if q is missing).
mcmc a list giving the MCMC parameters. The list must include the following integers: nburn giving the number of burn-in scans, nskip giving the thinning interval, nsave giving the total number of scans to be saved, ndisplay giving the number of saved scans to be displayed on screen.
state a list giving the current value of the parameters. This list is used if the current analysis is the continuation of a previous analysis (not available yet).
status a logical variable indicating whether this run is new (TRUE) or the continuation of a previous analysis (FALSE). In the latter case the current value of the parameters must be specified in the object state (not available yet).
data data frame.
na.action a function that indicates what should happen when the data contain NAs. The default action (na.fail) causes HDPdensity to print an error message and terminate if there are any incomplete observations.
work.dir working directory.

Details

The function sets up and carries out posterior Markov chain Monte Carlo (MCMC) simulation for a hierarchical DP mixture model.

The model is a DP mixture of normals for related random probability measures H_j. Each random measure is assumed to arise as a mixture H_j = ε F_0 + (1-ε) F_j of one common distribution F_0 and a distribution F_j that is specific to the j-th submodel.

See Mueller, Quintana and Rosner (2004) for details of the model. In summary, the implemented model is as follows. Without loss of generality we assume that each submodel corresponds to a different study in a set of related studies. Let theta_{ij} denote the i-th observation in the j-th study (we use theta, assuming that the model would typically be used for a random effects distribution). We assume that theta_{ji}, i=1,...,n_j are samples from a random probability measure for the j-th study, which in turn is a mixture of a measure F_0 that is common to all studies, and an idiosyncratic measure F_j that is specific to the j-th study.

theta_{ji} sim ε F_0 + (1-ε) F_j

The random probability measures F_j in turn are given a Dirichlet process mixture of normal prior. We assume

F_j(theta) = int N(μ,S) dG_j(μ),~ j=0,1,...,J

with G_j sim DP(G^star(eta),α). Here eta are hyperparameters that index the base measure of the DP prior. We use a normal base measure and a conjugate hyperprior

G^star(μ) = N(m,B), mbox{ with } m sim N(a,A), mbox{ and } B^{-1}sim Wishart(cc,(ccC)^{-1})

The Wishart prior is parametrized such that E(B^{-1}=C^{-1}). Let delta_x denote a point mass at x. We complete the model with the hyperpriors

S^{-1} sim W(q,(qR)^{-1}),~ p(ε) = π_0delta_0+π_1delta_1+(1-π_0-π_1) Be(a_ε,b_ε)

Regression on observation-specific covariates x_{ji} can be achieved by including x_{ji} with the outcome theta_{ji}, and proceeding as if (x_{ji},theta_{ji}) were generated as theta_{ji} in the model described above. See Mueller et al. (2004, section 3.3) for details.

Value

The function returns no value. MCMC simulations are saved in files in the designated working directory.
Use predict.HDPdensity to plot summaries.

Author(s)

Peter Mueller <pmueller@mdanderson.org>

References

Mueller, P., Quintana, F. and Rosner, G. (2004). A Method for Combining Inference over Related Nonparametric Bayesian Models. Journal of the Royal Statistical Society, Series B, 66: 735-749.

See Also

predict.HDPdensity

Examples

## Not run: 

    # Data
      data(calgb)

    # Prior information
      Z <- calgb[,1:10]
      mhat <- apply(Z,2,mean)
      v <- diag(var(Z))
     
      prior<-list(a0=1,
                  b0=1,
                  pe1=0.1,
                  pe0=0.1,
                  ae=1,
                  be=1,
                  a=mhat,
                  A=diag(v), 
                  q=15,
                  R=0.25*diag(v),
                  cc=15,
                  C=diag(v))

    # Initial state
      state <- NULL

    # MCMC parameters

      mcmc <- list(nburn=1000,
                   nsave=2000,
                   nskip=0,
                   ndisplay=100,
                   npredupdate=100)

    # Fitting the model
      fit1 <- HDPdensity(formula=cbind(Z1,Z2,Z3,T1,T2,B0,B1)~CTX+GM+AMOF,
                         study=~study,
                         prior=prior,
                         mcmc=mcmc,
                         state=state,
                         data=calgb,  
                         status=TRUE)

    # Load data for future patients (for prediction)
      data(calgb.pred)
      X <- calgb.pred 

    # post-process MCMC output for predictive inference
    # save posterior predictive simulations in z00 ... z30

      z10 <- predict(fit1,data.pred=X,j=1,r=0) # post prediction for study 1
      z20 <- predict(fit1,data.pred=X,j=2,r=0) # .. study 2
      z30 <- predict(fit1,data.pred=X,j=3,r=0) # .. population at large (= study 3)

      z11 <- predict(fit1,data.pred=X,j=1,r=1) # idiosyncratic measures study 1
      z21 <- predict(fit1,data.pred=X,j=2,r=1) # .. study 2
      z00 <- predict(fit1,data.pred=X,j=0,r=0) # common measure

    # covariates (and dummy responses) of future patients
      colnames(z00) <- c("PATIENT",colnames(X))

    # plot estimated density for future patients in study 1, 2 and
    # in population at large
      idx <- which(z10[,1]==1)   ## PATIENT 1
      options(digits=2)
      par(mfrow=c(2,1))          

    # plot prediction fo study 1,2,population
      plot  (density(z10[idx,8]),
             ylim=c(0,1.5),xlim=c(-0.5,2.5),
             xlab="SLOPE OF RECOVERY",bty="l",main="FUTURE PAT 1")
      lines (density(z20[idx,8]),type="l",col=2)
      lines (density(z30[idx,8]),type="l",col=3)
      legend(-0.5,1.5,col=1:3,legend=c("STUDY 1","STUDY 2","POPULATION"),
             lty=c(1,1,1),bty="n")

    # common and idiosyncratic measures
      plot (density(z00[idx,8]),type="l",col=4,lty=1,
            ylim=c(0,1.5),xlim=c(-0.5,2.5),
            xlab="SLOPE OF RECOVERY",bty="l",main="COMMON & IDIOSYNC PARTS")
      lines (density(z11[idx,8]),type="l",col=1,lty=2)
      lines (density(z21[idx,8]),type="l",col=2,lty=2)
      legend(1.5,1.5,col=c(1,2,4),lty=c(2,2,1),
             legend=c("STUDY 1 (idiosyn.)",
                      "STUDY 2 (idiosyn.)",
                      "COMMON"),bty="n")

    # plot estimated density for future patients in study 1, 2 and
    # in population at large
      idx <- which(z10[,1]==2)   ## PATIENT 2
      options(digits=2)
      par(mfrow=c(2,1))

      plot  (density(z10[idx,8]),
             ylim=c(0,1.5),xlim=c(-0.5,2.5),
             xlab="SLOPE OF RECOVERY",bty="l",main="FUTURE PAT 2")
      lines (density(z20[idx,8]),type="l",col=2)
      lines (density(z30[idx,8]),type="l",col=3)
      legend(-0.5,1.5,col=1:3,legend=c("STUDY 1","STUDY 2","POPULATION"),
             lty=c(1,1,1),bty="n")

      plot (density(z00[idx,8]),type="l",col=4,lty=1,
            ylim=c(0,1.5),xlim=c(-0.5,2.5),
            xlab="SLOPE OF RECOVERY",bty="l",main="COMMON & IDIOSYNC PARTS")
      lines (density(z11[idx,8]),type="l",col=1,lty=2)
      lines (density(z21[idx,8]),type="l",col=2,lty=2)
      legend(1.5,1.5,col=c(1,2,4),lty=c(2,2,1),
             legend=c("STUDY 1 (idiosyn.)",
                      "STUDY 2 (idiosyn.)",
                      "COMMON"),bty="n")

    # plot nadir count by covariate, for population 
      z2 <- z30[,3]; ctx <- z30[,9]; gm <- z30[,10]; amf <- z30[,11]
    # fix covariates gm (GM-CSF) and amf (aminofostine)
      idx <- which( (gm==-1.78) & (amf== -0.36) )
      boxplot(split(z2,ctx),
              xlab="CYCLOPHOSPHAMIDE",bty="n",ylab="NADIR COUNT")

## End(Not run)

[Package DPpackage version 1.0-6 Index]