tgp {tgp}R Documentation

Generic interface to treed Gaussian process models

Description

A generic interface to treed Gaussian process models used by many of the functions of class "tgp": bgpllm, btlm, blm, bgp, btgpllm bgp, and plot.tgp, tgp.trees. This more complicated interface is provided for a finer control of the model parameterization.

Usage

tgp(X, Z, XX = NULL, BTE = c(2000, 7000, 2), R = 1, m0r1 = FALSE,
        linburn = FALSE, params = NULL, pred.n = TRUE,
        ds2x = FALSE, ego = FALSE, traces = FALSE, verb = 1)

Arguments

X data.frame, matrix, or vector of inputs X
Z Vector of output responses Z of length equal to the leading dimension (rows) of X
XX Optional data.frame, matrix, or vector of predictive input locations with the same number of columns as X
BTE 3-vector of Monte-carlo parameters (B)urn in, (T)otal, and (E)very. Predictive samples are saved every E MCMC rounds starting at round B, stopping at T.
R Number of repeats or restarts of BTE MCMC rounds, default R=1 is no restarts
m0r1 If TRUE the responses Z will be scaled to have a mean of zero and a range of 1; default is FALSE
linburn If TRUE initializes MCMC with B (additional) rounds of Bayesian linear CART (bcart); default is FALSE
params Generic parameters list which can be provided for a more flexible model. See tgp.default.params for more details about the parameter list
pred.n TRUE (default) value results in prediction at the inputs X; FALSE skips prediction at X resulting in a faster implementation
ds2x TRUE results in ALC (Active Learning–Cohn) computation of expected reduction in uncertainty calculations at the XX locations, which can be used for adaptive sampling; FALSE (default) skips this computation, resulting in a faster implementation
ego TRUE results in EGO (Expected Global Optimization) computation of expected information about the location of the minimum reduction in uncertainty calculations at the XX locations, which can be used for adaptive sampling; FALSE (default) skips this computation, resulting in a faster implementation
traces TRUE results in a saving of samples from the posterior distribution for most of the parameters in the model. The default is FALSE for speed/storage reasons. See note below
verb Level of verbosity of R-console print statements: from 0 (none); 1 (default) which shows the “progress meter”; 2 includes an echo of initialization parameters; up to 3 and 4 (max) with more info about successful tree operations.

Value

tgp returns an object of class "tgp". The function plot.tgp can be used to help visualize results.
An object of type "tgp" is a list containing at least the following components... The final two (parts & trees) are tree-related outputs unique to the T (tree) class functions– those which have a positive first (alpha) parameter in params$tree <- c(alpha, beta, minpart. Tree viewing is supported by tgp.trees.

state unsigned short[3] random number seed to C
X Input argument: data.frame of inputs X
n Number of rows in X, i.e., dim(X)[1]
d Number of cols in X, i.e., dim(X)[2]
Z Vector of output responses Z
XX Input argument: data.frame of predictive locations XX
nn Number of rows in XX, i.e., dim(XX)[1]
BTE Input argument: Monte-carlo parameters
R Input argument: restarts
linburn Input argument: initialize MCMC with linear CART
params list of model parameters generated by tgp.default.params
dparams Double-representation of model input parameters used by C-code
Zp.mean Vector of mean predictive estimates at X locations
Zp.q1 Vector of 5% predictive quantiles at X locations
Zp.q2 Vector of 95% predictive quantiles at X locations
Zp.q Vector of quantile norms Zp.q2 - Zp.q1
ZZ.q1 Vector of 5% predictive quantiles at XX locations
ZZ.q2 Vector of 95% predictive quantiles at XX locations
ZZ.q Vector of quantile norms ZZ.q2 - ZZ.q1, used by the Active Learning–MacKay (ALM) adaptive sampling algorithm
Ds2x If argument ds2x=TRUE, this vector contains ALC statistics for XX locations
ego If argument ego=TRUE, this vector contains EGO statistics for XX locations
response Name of response Z if supplied by data.frame in argument, or “z” if none provided
parts Internal representation of the regions depicted by partitions of the maximum a' posteriori (MAP) tree
trees list of trees (maptree representation) which were MAP as a function of each tree height sampled between MCMC rounds B and T
traces list containing traces of most of the model parameters and posterior predictive distributions at input locations XX. See note below
verb Input argument: verbosity level

Note

Inputs X, XX, Z containing NaN, NA, Inf are discarded with non-fatal warnings

Upon execution, MCMC reports are made every 1,000 rounds to indicate progress

Stationary (non-treed) processes on larger inputs (e.g., X,Z) of size greater than 500, *might* be slow in execution, especially on older machines. Once the C code starts executing, it can be interrupted in the usual way: either via Ctrl-C (Unix-alikes) or pressing the Stop button in the R-GUI. When this happens, interrupt messages will indicate which required cleanup measures completed before returning control to R

Regarding traces=TRUE: Samples from the posterior will be collected for all parameters in the model, except those of the hierarchical priors, e.g., b0, etc. Traces for some parameters are stored in memory, others in files. GP parameters are collected with reference to the locations in XX, resulting nn=dim{XX}[2] traces of d,g,s2,tau2, etc. Therefore, it is recommended that nn is chosen to be a small, representative, set of input locations. Besides GP parameters, traces are saved for the tree partitions, areas under the LLM, log posterior (as a function of tree height), and samples ZZ from the posterior predictive distribution at XX

Author(s)

Robert B. Gramacy rbgramacy@ams.ucsc.edu

References

Gramacy, R. B., Lee, H. K. H. (2006). Bayesian treed Gaussian process models. Available as UCSC Technical Report ams2006-01.

Gramacy, R. B., Lee, H. K. H. (2006). Adaptive design of supercomputer experiments. Available as UCSC Technical Report ams2006-02.

Gramacy, R. B., Lee, H. K. H., & Macready, W. (2004). Parameter space exploration with Gaussian process trees. ICML (pp. 353–360). Omnipress & ACM Digital Library.

Chipman, H., George, E., & McCulloch, R. (1998). Bayesian CART model search (with discussion). Journal of the American Statistical Association, 93, 935–960.

Chipman, H., George, E., & McCulloch, R. (2002). Bayesian treed models. Machine Learning, 48, 303–324.

http://www.ams.ucsc.edu/~rbgramacy/tgp.html

See Also

tgp.default.params, bgpllm, btlm, blm, bgp, btgpllm bgp, plot.tgp, tgp.trees

Examples

##
## Many of the examples below illustrate the above 
## function(s) on random data.  Thus it can be fun
## (and informative) to run them several times.
##

# 
# simple linear response
#

# input and predictive data
X <- seq(0,1,length=50)
XX <- seq(0,1,length=99)
Z <- 1 + 2*X + rnorm(length(X),sd=0.25)

# out <- blm(X=X, Z=Z, XX=XX)   # try Linear Model with tgp
p <- tgp.default.params(2)
p$tree <- c(0,0,10)             # no tree
p$gamma <- c(-1,0.2,0.7)        # force llm
out <- tgp(X=X,Z=Z,XX=XX,params=p) 
plot(out)                       # plot the surface

#
# 1-d Example
# 

# construct some 1-d nonstationary data
X <- seq(0,20,length=100)
XX <- seq(0,20,length=99)
Z <- (sin(pi*X/5) + 0.2*cos(4*pi*X/5)) * (X <= 9.6)
lin <- X>9.6; 
Z[lin] <- -1 + X[lin]/10
Z <- Z + rnorm(length(Z), sd=0.1)

# out <- btlm(X=X, Z=Z, XX=XX) # try Linear CART with tgp
p <- tgp.default.params(2)
p$gamma <- c(-1,0.2,0.7)        # force llm
out <- tgp(X=X,Z=Z,XX=XX,params=p)
plot(out)                       # plot the surface
tgp.trees(out)                  # plot the MAP trees

# out <- btgp(X=X, Z=Z, XX=XX)  # use a treed GP with tgp
p <- tgp.default.params(2)
p$gamma <- c(0,0.2,0.7)         # force no llm
out <- tgp(X=X,Z=Z,XX=XX,params=p)
plot(out)                       # plot the surface
tgp.trees(out)                  # plot the MAP trees

#
# 2-d example
# (using the isotropic correlation function)
#

# construct some 2-d nonstationary data
exp2d.data <- exp2d.rand()
X <- exp2d.data$X; Z <- exp2d.data$Z
XX <- exp2d.data$XX

# try a GP with tgp
# out <- bgp(X=X, Z=Z, XX=XX, corr="exp")       
p <- tgp.default.params(3)
p$tree <- c(0,0,10)             # no tree
p$gamma <- c(0,0.2,0.7)         # no llm
p$corr <- "exp" 
out <- tgp(X=X,Z=Z,XX=XX,params=p)
plot(out)                       # plot the surface

# try a treed GP LLM with tgp
# out <- btgpllm(X=X,Z=Z,XX=XX,corr="exp") 
p <- tgp.default.params(3)
p$corr <- "exp" 
out <- tgp(X=X,Z=Z,XX=XX,params=p)
plot(out)                       # plot the surface
tgp.trees(out)                  # plot the MAP trees

#
# Motorcycle Accident Data
#

# get the data
require(MASS)

# try a custom treed GP LLM with tgp, without m0r1
p <- tgp.default.params(2)
p$bprior <- "b0" # beta linear prior for common mean
p$nug.p <- c(1.0,0.1,10.0,0.1) # mixture nugget prior
out <- tgp(X=mcycle[,1], Z=mcycle[,2], params=p,
           BTE=c(2000,22000,2)) # run mcmc longer
plot(out)                       # plot the surface
tgp.trees(out)                  # plot the MAP trees

# for other examples try the demos or the vignette

[Package tgp version 1.1-11 Index]