prim.box {prim} | R Documentation |
PRIM for multivariate data. Result is an estimate of the highest density region (HDR).
prim.box(x, y, box.init=NULL, peel.alpha=0.05, paste.alpha=0.01, mass.min=0.05, threshold, pasting=TRUE, verbose=FALSE, threshold.type=0) prim.hdr(prim, threshold, threshold.type) prim.combine(prim1, prim2)
x |
matrix of data values |
y |
vector of response values |
box.init |
initial covering box |
peel.alpha |
peeling quantile tuning parameter |
paste.alpha |
pasting quantile tuning parameter |
mass.min |
minimum mass tuning parameter |
threshold |
threshold tuning parameter(s) |
threshold.type |
1 = positive HDR, -1 = negative HDR, 0 = both HDR |
pasting |
flag for pasting |
verbose |
flag for printing output during execution |
prim,prim1,prim2 |
objects of type prim |
The data are (X_1, Y_1), ..., (X_n, Y_n) where X_i is d-dimensional and Y_i is a scalar response. PRIM finds modal (and/or anti-modal) regions in the conditional expectation m(x) = E(Y | x). These regions are also called the highest density regions (HDR).
In general, Y_i can be real-valued. See
vignette("prim")
.
Here, we focus on the special case for binary Y_i. Let
Y_i = 1 when
X_i ~ F+; and Y_i = -1 when
X_i ~ F- where F+ and F- are different
distribution functions. In this set-up, PRIM finds the
regions where F+ and F- are most different.
The tuning parameters peel.alpha
and paste.alpha
control
the `patience' of PRIM. Smaller values involve more patience. Larger
values less patience. The peeling steps remove data from a box till
either the box mean is smaller than threshold
or the box mass
is less than mass.min
. Pasting is optional, and is used to correct any
possible over-peeling. The default values for peel.alpha
,
paste.alpha
and mass.min
are taken from Friedman &
Fisher (1999).
Specifying the type of HDR is controlled by threshold
and
threshold.type
:
threshold.type=1
, then we search for
positive HDR {m(x) >= threshold
}.threshold.type=-1
, then we search for negative HDR
{m(x) <= threshold
}.threshold.type=0
, then we search for both the positive and
negative HDR. In this case make sure that threshold
is
(positive HDR threshold, negative HDR threshold).
There are two ways of using PRIM. One is prim.box
with
pre-specified threshold(s). This is
appropriate when the threshold(s) are known to produce good estimates.
On the other hand, if the user doesn't provide
threshold values then prim.box
computes box
sequences which cover the data range. These can then be pruned into
HDRs. prim.hdr
allows the user to specify many different threshold values in an
efficient manner, without having to recomputing the entire PRIM box
sequence. prim.combine
can be used to
join the separate positive and negative HDR computed from
prim.hdr
. See the examples below.
– prim.box
produces a PRIM estimate of HDRs, an object of
type prim
, which is a
list with 8 fields:
x |
list of data matrices |
y |
list of response variable vectors |
y.mean |
list of vectors of box mean for y |
box |
list of matrices of box limits (first row = minima, second row = maxima) |
mass |
vector of box masses (proportion of points inside a box) |
num.class |
total number of PRIM boxes |
num.hdr.class |
total number of PRIM boxes which form the HDR |
ind |
HDR indicator: 1 = positive HDR, -1 = negative HDR |
The above lists have num.class
fields, one for each box.
– prim.hdr
takes a prim
object and computes HDRs with
different threshold values. Returns another prim
object. This
is much
faster for experimenting with different threshold values than calling
prim.box
each time.
– prim.combine
combines two prim
objects into a single
prim object. Useful for combining positive and negative HDRs. Usually
used in conjunction with prim.hdr
. See examples below.
Friedman, J.H. & Fisher, N.I. (1999) Bump-hunting for high dimensional data, Statistics and Computing, 9, 123–143.
n <- 1000 set.seed(88192) mus.p <- rbind(c(0,0), c(2,0), c(1, 2), c(2.5, 2)) Sigmas.p <- 0.125*rbind(diag(2), diag(c(0.5, 0.5)), diag(c(0.125, 0.25)), diag(c(0.125, 0.25))) props.p <- c(0.5, 0.25, 0.125, 0.125) mus.m <- rbind(c(0,0), c(2,0), c(2.5, 2)) Sigmas.m <- 0.125*rbind(invvech(c(1,-0.6,1)), diag(c(0.5, 0.5)),diag(c(0.125, 0.25))) props.m <- c(0.625, 0.25, 0.125) x.p <- rmvnorm.mixt(n, mus.p, Sigmas.p, props.p) x.m <- rmvnorm.mixt(n, mus.m, Sigmas.m, props.m) x <- rbind(x.p, x.m) y <- c(rep(1, nrow(x.p)), rep(-1, nrow(x.m))) ## 1 = positive sample, -1 = negative sample y.thr <- c(1, -0.35) ## using only one command x.prim1 <- prim.box(x=x, y=y, threshold=y.thr, threshold.type=0) ## alternative - requires more commands but allows more control ## in intermediate stages x.prim.hdr.plus <- prim.box(x=x, y=y, threshold.type=1, threshold=1) x.prim.minus <- prim.box(x=x, y=y, threshold.type=-1) summary(x.prim.minus) ## threshold too high, try lower one x.prim.hdr.minus <- prim.hdr(x.prim.minus, threshold=-0.35, threshold.type=-1) x.prim2 <- prim.combine(x.prim.hdr.plus, x.prim.hdr.minus) plot(x.prim2) col <- x.prim2$ind col[col==1] <- "orange" col[col==-1] <- "blue" plot(x.prim2, col=col) summary(x.prim1) summary(x.prim2) ## should be exactly the same as command above