featureSignif {feature}R Documentation

Feature significance for kernel density estimation

Description

Identify significant features of kernel density estimates of 1- to 4-dimensional data.

Usage

featureSignif(x, bw, gridsize, scaleData=FALSE, addSignifGrad=TRUE,
   addSignifCurv=TRUE, signifLevel=0.05)  

Arguments

x data matrix
bw vector of bandwidth(s)
gridsize vector of estimation grid sizes
scaleData flag for scaling the data i.e. transforming to unit variance for each dimension.
addSignifGrad flag for computing significant gradient regions
addSignifCurv flag for computing significant curvature regions
signifLevel significance level

Details

Feature significance is based on significance testing of the gradient (first derivative) and curvature (second derivative) of a kernel density estimate. This was developed for 1-d data by Chaudhuri & Marron (1995), for 2-d data by Godtliebsen, Marron & Chaudhuri (1999), and for 3-d and 4-d data by Duong, Cowling, Koch & Wand (2007).

The test statistic for gradient testing is at a point x is

W(x) = || hat{grad f}(x; H)||^2

where hat{grad f}(x; H) is kernel estimate of the gradient of f(x) with bandwidth H, and ||.|| is the Euclidean norm. W(x) is approximately chi-squared distributed with d degrees of freedom where d is the dimension of the data.

The analogous test statistic for curvature is

W2(x) = ||vech hat{curv f}(x; H)||^2

where hat{curv f}(x; H) is the kernel estimate of the curvature of f(x), and vech is the vector-half operator. W2(x) is approximately chi-squared distributed with d(d+1)/2 degrees of freedom.

Since this is a situation with many dependent hypothesis tests, we use a multiple comparison or simultaneous test to control the overall level of significance. We use a Hochberg-type procedure. See Hochberg (1988) and Duong, Cowling, Koch & Wand (2007).

Value

Returns an object of class fs which is a list with the following fields
x - data matrix
names - name labels used for plotting
bw - vector of bandwidths
fhat - kernel density estimate on a grid
grad - logical grid for significant gradient
curv - logical grid for significant curvature
gradData - logical vector for significant gradient data points
gradDataPoints - significant gradient data points
curvData - logical vector for significant curvature data points
curvDataPoints - significant curvature data points

References

Chaudhuri, P. and Marron, J.S. (1999) SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807-823.

Duong, T., Cowling, A., Koch, I., Wand, M.P. (2008) Feature significance for multivariate kernel density estimation. Computational Statistics and Data Analysis, 52, 4225-4242.

Hochberg, Y. (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800-802.

Godtliebsen, F., Marron, J.S. and Chaudhuri, P. (2002) Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11, 1-22.

Wand, M.P. and Jones, M.C. (1995) Kernel Smoothing. Chapman & Hall/CRC, London.

See Also

featureSignifGUI, plot.fs

Examples

## Univariate example
data(earthquake)
eq3 <- -log10(-earthquake[,3])
fs <- featureSignif(eq3, bw=0.1)
plot(fs, addSignifGradRegion=TRUE)

## Bivariate example
library(MASS)
data(geyser)
fs <- featureSignif(geyser)
plot(fs, addSignifCurvRegion=TRUE)

## Trivariate example
data(earthquake)
earthquake[,3] <- -log10(-earthquake[,3])
fs <- featureSignif(earthquake, scaleData=TRUE, bw=c(0.06, 0.06, 0.05))
plot(fs, addKDE=TRUE)
plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE)

[Package feature version 1.2.2 Index]