bcp {bcp}R Documentation

A Package for Performing a Bayesian Analysis of Change Point Problems

Description

bcp() implements the Barry and Hartigan (1993) product partition model for the standard change point problem using Markov Chain Monte Carlo. Versions < 2.0 are quadratic in speed, and perform the default 550 iterations in approximately 0.75 seconds for a sequence of length 100. Versions => 2.0 are linear in speed and partition a sequence of length 10,000 in approximately 45 seconds (compared with 45 minutes for versions < 2.0). These times were computed on a PC with Windows XP, a Pentium D Processor (2.99 GHz) and 3.50GB of RAM.

Usage

 bcp(x, w0 = 0.2, p0 = 0.2, burnin = 50, mcmc = 500,
     return.mcmc = FALSE, nwssleigh = NULL)

Arguments

x a vector of numerical data.
w0 an optional numeric value specifying the prior variance. If no value is specified, the default value of 0.2 is used, as recommended by Barry and Hartigan.
p0 an optional numeric value specifying the prior probability of a change point at each location in the sequence. If no value is specified, the default value of 0.2 is used, as recommended by Barry and Hartigan.
burnin the number of burnin iterations.
mcmc the number of iterations after burnin.
return.mcmc if return.mcmc=TRUE the posterior means and the partitions after each iteration are returned.
nwssleigh the NWS sleigh (see details).

Details

This algorithm is used when there exists an unknown partition of a sequence into contiguous blocks such that the mean is constant within blocks.

When NetWorkSpaces (package nws) is used for parallel Markov chains, there is network overhead as well as the overhead associated with burning in each chain. A special note is needed about the values returned (with respect to blocks and, if return.mcmc=TRUE, about mcmc.means and mcmc.rhos): the burnins are collected at the beginning of the vector (or matrix, as the case may be) of results, followed by blocks of the mcmc results. So, for example, positions 1:burnin contain the burnins from the first worker. And if there are k workers, then positions k*burnin+1 through k*burnin+mcmc/k will contain the results of the first chain (ideally mcmc should be divisible by k). As a result, convergence diagnostics (for example, the heidel.diag() function of the coda package) should be applied to parallel bcp objects with care.

If you are unfamiliar with NetWorkSpaces, it requires both the nws package (available from CRAN) and a NetWorkSpaces server (freely available but somewhat more work to get started). We will keep a page updated with links and information to help you get started: http://www.stat.yale.edu/~jay/nws/.

Value

bcp() returns a list containing the following components:

data a vector copy of the data.
return.mcmc if return.mcmc=TRUE, the posterior means and the partitions after each iteration are returned.
mcmc.means if return.mcmc=TRUE, mcmc.means contains the posterior means at for each iteration conditional on the state of the partition.
mcmc.rhos if return.mcmc=TRUE, mcmc.rhos contains the partitions after each iteration. A value of 1 indicates the end of a block.
blocks a vector of the number of blocks after each iteration.
posterior.mean a vector of the posterior means over the iterations, excuding burnin.
posterior.var a vector of the posterior variances over the iterations, excuding burnin.
posterior.prob a vector of the posterior probability of changes over the iterations, excuding burnin.
burnin the number of burnin iterations.
mcmc the number of iterations after burnin.
w0 an optional numeric value specifying the prior variance. If no value is specified, the default value of 0.2 is used, as recommended by Barry and Hartigan.
p0 an optional numeric value specifying the prior probability of a change point at each location in the sequence. If no value is specified, the default value of 0.2 is used, as recommended by Barry and Hartigan.


The generic accessor functions data, blocks, and posterior.mean, posterior.prob, burnin, mcmc, w0, and p0 extract various useful components of the list returned by bcp(). The functions summary.bcp(), print.bcp(), and plot.bcp() are used to obtain summaries of the results.

Note

The vector of data must contain no missing or NA values. The user may set .Random.seed to control the MCMC if desired.

Author(s)

Chandra Erdman and John W. Emerson

References

Bai J., Perron P. (2003), Computation and Analysis of Multiple Structural Change Models, Journal of Applied Econometrics, 18, 1-22. url: http://qed.econ.queensu.ca/jae/2003-v18.1/bai-perron/.

Daniel Barry and J. A. Hartigan (1993), A Bayesian Analysis for Change Point Problems, Journal of The American Statistical Association, 88, 309-19.

Olshen, A. B., Venkatraman, E. S., Lucito, R., Wigler, M. (2004), Circular binary segmentation for the analysis of array-based DNA copy number data, Biostatistics, 5, 557-572. url: http://www.bioconductor.org/repository/release1.5/package/html/DNAcopy.html.

Snijders et al. (2001), Assembly of microarrays for genome-wide measurement of DNA copy number, Nature Genetics, 29, 263-264.

Achim Zeileis, Friedrich Leisch, Bruce Hansen, Kurt Hornik, Christian Kleiber (2006), The strucchange Package, version 1.3-1, CRAN: The Comprehensive R Network.

See Also

plot.bcp, summary.bcp, and print.bcp for summaries of the results.

Examples


  ##### A random sample from a few normal distributions #####
  testdata <- c(rnorm(50), rnorm(50, 5, 1), rnorm(50))
  bcp.0 <- bcp(testdata)
  plot.bcp(bcp.0)

  ## Not run: 
    ##### An example using NetWorkSpaces for parallel MCMC; note that
    ##### you must have package nws as well as an NWS server available,
    ##### specified in the nwsHost argument (see Details, above).
    if (require(nws)) {
      s <- sleigh(workerCount=4, nwsHost='hostname.XXX.YYY.ZZZ')
      bcp.parallel <- bcp(testdata, nwssleigh=s)
      stopSleigh(s)
    }
  
## End(Not run)
  
  ##### Coriell chromosome 11 #####
  data(coriell)
  chrom11 <- as.vector(na.omit(coriell$Coriell.05296[coriell$Chromosome==11]))
  bcp.11 <- bcp(chrom11)
  plot.bcp(bcp.11)
  
  # to see bcp and Circular Binary Segmentation results run:
  if(require("DNAcopy")) {
  n <- length(chrom11)
  cbs <- segment(CNA(chrom11, rep(1, n), 1:n), verbose = 0)
  cbs.ests <- rep(unlist(cbs$output[6]), unlist(cbs$output[5]))
  op <- par(mfrow=c(2,1),col.lab="black",col.main="black")
  op2 <- par(mar=c(0,4,4,2),xaxt="n", cex.axis=0.75)
  plot(1:n, bcp.11$data, col="grey", pch=20, xlab="Location", ylab="Posterior Mean", main="Posterior Means and Probabilities of a Change")
  lines(cbs.ests, col="red")
  lines(bcp.11$posterior.mean, lwd=2)
  par(op2)
  op3 <- par(mar=c(5,4,0,2), xaxt="s", cex.axis=0.75)
  plot(1:n, bcp.11$posterior.prob, type="l", ylim=c(0,1), xlab="Location", ylab="Posterior Probability", main="")
  for(i in 1:(dim(cbs$output)[1]-1)) abline(v=cbs$output$loc.end[i], col="red")
  par(op3)
  par(op)
  } else {
        cat("DNAcopy is not loaded")
  }
  
  ##### RealInt #####
  data("RealInt")
  bcp.ri <- bcp(as.vector(RealInt))
  plot.bcp(bcp.ri)
  
  # to see bcp and Bai and Perron results run:
  if(require("strucchange")) {
  bp <- breakpoints(RealInt ~ 1, h = 2)$breakpoints
  rho <- rep(0, length(RealInt))
  rho[bp] <- 1
  b.num<-1 + c(0,cumsum(rho[1:(length(rho)-1)]))
  bp.mean <- unlist(lapply(split(RealInt,b.num),mean))
  bp.ri <- rep(0,length(RealInt))
  for(i in 1:length(bp.ri)) bp.ri[i] <- bp.mean[b.num[i]]
  xax <- seq(1961, 1987, length=103)
  op<-par(mfrow=c(2,1),col.lab="black",col.main="black")
  op2 <- par(mar=c(0,4,4,2),xaxt="n", cex.axis=0.75)
  plot(1:length(bcp.ri$data), bcp.ri$data, col="grey", pch=20, xlab="", ylab="Posterior Mean", main="U.S. Ex-Post Interest Rate")
  lines(bcp.ri$posterior.mean, lwd=2)
  lines(bp.ri, col="blue")
  par(op2)
  op3 <- par(mar=c(5,4,0,2), xaxt="s", cex.axis=0.75)
  plot(xax, bcp.ri$posterior.prob, yaxt="n", type="l", ylim=c(0,1), xlab="Year", ylab="Posterior Probability", main="")
  for(i in 1:length(bp.ri)) abline(v=xax[bp[i]], col="blue")
  axis(2, yaxp=c(0, 0.9, 3))
  par(op3)
  par(op)
  } else {
    cat("strucchange is not loaded")
  }

[Package bcp version 2.1.1 Index]