prim.box {prim}R Documentation

PRIM for multivariate data

Description

PRIM for multivariate data. Result is an estimate of the highest density region (HDR).

Usage

prim.box(x, y, box.init=NULL, peel.alpha=0.05, paste.alpha=0.01,
     mass.min=0.05, threshold, pasting=TRUE, verbose=FALSE,
     threshold.type=0)

prim.hdr(prim, threshold, threshold.type)
prim.combine(prim1, prim2)

Arguments

x matrix of data values
y vector of response values
box.init initial covering box
peel.alpha peeling quantile tuning parameter
paste.alpha pasting quantile tuning parameter
mass.min minimum mass tuning parameter
threshold threshold tuning parameter(s)
threshold.type 1 = positive HDR, -1 = negative HDR, 0 = both HDR
pasting flag for pasting
verbose flag for printing output during execution
prim,prim1,prim2 objects of type prim

Details

The data are (X_1, Y_1), ..., (X_n, Y_n) where X_i is d-dimensional and Y_i is a scalar response. PRIM finds modal (and/or anti-modal) regions in the conditional expectation m(x) = E(Y | x). These regions are also called the highest density regions (HDR).

In general, Y_i can be real-valued. See vignette("prim"). Here, we focus on the special case for binary Y_i. Let Y_i = 1 when X_i ~ F+; and Y_i = -1 when X_i ~ F- where F+ and F- are different distribution functions. In this set-up, PRIM finds the regions where F+ and F- are most different.

The tuning parameters peel.alpha and paste.alpha control the `patience' of PRIM. Smaller values involve more patience. Larger values less patience. The peeling steps remove data from a box till either the box mean is smaller than threshold or the box mass is less than mass.min. Pasting is optional, and is used to correct any possible over-peeling. The default values for peel.alpha, paste.alpha and mass.min are taken from Friedman & Fisher (1999).

Specifying the type of HDR is controlled by threshold and threshold.type:

There are two ways of using PRIM. One is prim.box with pre-specified threshold(s). This is appropriate when the threshold(s) are known to produce good estimates.

On the other hand, if the user doesn't provide threshold values then prim.box computes box sequences which cover the data range. These can then be pruned into HDRs. prim.hdr allows the user to specify many different threshold values in an efficient manner, without having to recomputing the entire PRIM box sequence. prim.combine can be used to join the separate positive and negative HDR computed from prim.hdr. See the examples below.

Value

prim.box produces a PRIM estimate of HDRs, an object of type prim, which is a list with 8 fields:

x list of data matrices
y list of response variable vectors
y.mean list of vectors of box mean for y
box list of matrices of box limits (first row = minima, second row = maxima)
mass vector of box masses (proportion of points inside a box)
num.class total number of PRIM boxes
num.hdr.class total number of PRIM boxes which form the HDR
ind HDR indicator: 1 = positive HDR, -1 = negative HDR


The above lists have num.class fields, one for each box.
prim.hdr takes a prim object and computes HDRs with different threshold values. Returns another prim object. This is much faster for experimenting with different threshold values than calling prim.box each time.
prim.combine combines two prim objects into a single prim object. Useful for combining positive and negative HDRs. Usually used in conjunction with prim.hdr. See examples below.

References

Friedman, J.H. & Fisher, N.I. (1999) Bump-hunting for high dimensional data, Statistics and Computing, 9, 123–143.

Examples

n <- 1000
set.seed(88192)

mus.p <- rbind(c(0,0), c(2,0), c(1, 2), c(2.5, 2))
Sigmas.p <- 0.125*rbind(diag(2), diag(c(0.5, 0.5)),
   diag(c(0.125, 0.25)), diag(c(0.125, 0.25))) 
props.p <- c(0.5, 0.25, 0.125, 0.125)

mus.m <- rbind(c(0,0), c(2,0), c(2.5, 2))
Sigmas.m <- 0.125*rbind(invvech(c(1,-0.6,1)),
   diag(c(0.5, 0.5)),diag(c(0.125, 0.25))) 
props.m <- c(0.625, 0.25, 0.125)

x.p <- rmvnorm.mixt(n, mus.p, Sigmas.p, props.p)
x.m <- rmvnorm.mixt(n, mus.m, Sigmas.m, props.m)
x <- rbind(x.p, x.m)
y <- c(rep(1, nrow(x.p)), rep(-1, nrow(x.m)))
  ## 1 = positive sample, -1 = negative sample

y.thr <- c(1, -0.35)

## using only one command

x.prim1 <- prim.box(x=x, y=y, threshold=y.thr, threshold.type=0)

## alternative - requires more commands but allows more control
## in intermediate stages

x.prim.hdr.plus <- prim.box(x=x, y=y, threshold.type=1,
   threshold=1)

x.prim.minus <- prim.box(x=x, y=y, threshold.type=-1)
summary(x.prim.minus)
   ## threshold too high, try lower one

x.prim.hdr.minus <- prim.hdr(x.prim.minus, threshold=-0.35,
   threshold.type=-1)
x.prim2 <- prim.combine(x.prim.hdr.plus, x.prim.hdr.minus)
       
plot(x.prim2)

col <- x.prim2$ind
col[col==1] <- "orange"
col[col==-1] <- "blue" 
plot(x.prim2, col=col)

summary(x.prim1)
summary(x.prim2) ## should be exactly the same as command above

[Package prim version 1.0.1 Index]