featureSignif {feature} | R Documentation |
Identify significant features of kernel density estimates of 1- to 4-dimensional data.
featureSignif(x, bw, gridsize, scaleData=FALSE, addSignifGrad=TRUE, addSignifCurv=TRUE, signifLevel=0.05)
x |
data matrix |
bw |
vector of bandwidth(s) |
gridsize |
vector of estimation grid sizes |
scaleData |
flag for scaling the data i.e. transforming to unit variance for each dimension. |
addSignifGrad |
flag for computing significant gradient regions |
addSignifCurv |
flag for computing significant curvature regions |
signifLevel |
significance level |
Feature significance is based on significance testing of the gradient (first derivative) and curvature (second derivative) of a kernel density estimate. This was developed for 1-d data by Chaudhuri & Marron (1995), for 2-d data by Godtliebsen, Marron & Chaudhuri (1999), and for 3-d and 4-d data by Duong, Cowling, Koch & Wand (2007).
The test statistic for gradient testing is at a point x is
W(x) = || hat{grad f}(x; H)||^2
where hat{grad f}(x; H) is kernel estimate of the gradient of f(x) with bandwidth H, and ||.|| is the Euclidean norm. W(x) is approximately chi-squared distributed with d degrees of freedom where d is the dimension of the data.
The analogous test statistic for curvature is
W2(x) = ||vech hat{curv f}(x; H)||^2
where hat{curv f}(x; H) is the kernel estimate of the curvature of f(x), and vech is the vector-half operator. W2(x) is approximately chi-squared distributed with d(d+1)/2 degrees of freedom.
Since this is a situation with many dependent hypothesis tests, we use a multiple comparison or simultaneous test to control the overall level of significance. We use a Hochberg-type procedure. See Hochberg (1988) and Duong, Cowling, Koch & Wand (2007).
Returns an object of class fs
which is a list with the following fields
x
- data matrix
names
- name labels used for plotting
bw
- vector of bandwidths
fhat
- kernel density estimate on a grid
grad
- logical grid for significant gradient
curv
- logical grid for significant curvature
gradData
- logical vector for significant gradient data
points
gradDataPoints
- significant gradient data points
curvData
- logical vector for significant curvature data
points
curvDataPoints
- significant curvature data points
Chaudhuri, P. and Marron, J.S. (1999) SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807-823.
Duong, T., Cowling, A., Koch, I., Wand, M.P. (2008) Feature significance for multivariate kernel density estimation. Computational Statistics and Data Analysis, 52, 4225-4242.
Hochberg, Y. (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800-802.
Godtliebsen, F., Marron, J.S. and Chaudhuri, P. (2002) Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11, 1-22.
Wand, M.P. and Jones, M.C. (1995) Kernel Smoothing. Chapman & Hall/CRC, London.
## Univariate example data(earthquake) eq3 <- -log10(-earthquake[,3]) fs <- featureSignif(eq3, bw=0.1) plot(fs, addSignifGradRegion=TRUE) ## Bivariate example library(MASS) data(geyser) fs <- featureSignif(geyser) plot(fs, addSignifCurvRegion=TRUE) ## Trivariate example data(earthquake) earthquake[,3] <- -log10(-earthquake[,3]) fs <- featureSignif(earthquake, scaleData=TRUE, bw=c(0.06, 0.06, 0.05)) plot(fs, addKDE=TRUE) plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE)