gammatest {GammaTest}R Documentation

Calculating the Gamma Statistic

Description

Estimates the variance of the noise in an input-output dataset, provided the underlying function is smooth (i.e. bound by partial derivatives).

Usage

gammatest(data, mask=seq(from=1, to=1, length=(length(data[1,])-1)), p=10, eps=0.00, plot=TRUE,
summary=TRUE, ...)

Arguments

data An input-output dataset, where the output is in the last column.
mask A vector of 1's and 0's representing input inclusion/exclusion. The default mask is all 1's (i.e. include all inputs in the test).
p The maximum number of near neighbours to calculate the Gamma statistic. The default value is set to 10.
eps The error bound of calculating the near neighbour distances. The defualt is 0 meaning the exact near neighbours are calculated.
plot A logical indicating whether a plot is drawn.
summary Logical indicating whether to output the returned Gamma test statistics. Default is set to TRUE.
... Graphical options for plotting the Gamma regression line.

Details

The Gamma test is an algorithm capable of estimating the lowest attainable mean squared error (or noise variance) in an input/output dataset. The algorithm does not assume anything regarding the parametric equations governing the system, to this end the only requirement is that the underlying model is smooth.

The algorithm itself works by calculating the nearest neighbour distances in input space. This calculation is achieved in O(M log M) time, where M is the number of data points using Bentley's kd-tree. The GammaTest package utilizes the Approximate Near Neighbor (ANN) C++ library, which can give the exact near neighbours or (as the name suggests) approximate near neighbours to within a specified error bound. For more information on the ANN library please visit http://www.cs.umd.edu/~mount/ANN/.

Value

Mask The mask used to calculate the Gamma statistic
deltas.gammas The values of the deltas and gammas that are regressed (least squares) on in order to calculate the Gamma statistic.
Gamma The Gamma statistic
Gradient The Gradient of the regression line
Vratio The Gamma statistic divided by the variance of the output.

Author(s)

Samuel E. Kemp. To report any bugs or suggestions please email: sekemp@glam.ac.uk

References

Stefansson A., Koncar N. and Jones A. J. (1997), A note on the gamma test. Neural Computing Applications, 5:131-133.

Evans D., Jones A. J. and Schmidt W. (2002). Asymptotic moments of near neighbour distance distributions. Proc. Roy. Soc. Lond. Series A, 458(2028):2839-2849.

Evans D. and Jones A. J. (2002), A proof of the gamma test. Proc. Roy. Soc. Lond. Series A, 458(2027):2759-2799.

Bentley J. L. (1975), Multidimensional binary search trees used for associative search. Communication ACM, 18:309-517.

Arya S. and Mount D. M. (1993), Approximate nearest neighbor searching, Proc. 4th Ann. ACM-SIAM Symposium on Discrete Algorithms (SODA'93), 271-280.

Arya S., Mount D. M., Netanyahu N. S., Silverman R. and Wu A. Y (1998), An optimal algorithm for approximate nearest neighbor searching, Journal of the ACM, 45, 891-923.

For papers and Gamma test related material visit http://users.cs.cf.ac.uk:81/Antonia.J.Jones/GammaArchive/IndexPage.htm

See Also

fesearch mtest iesearch dvec

Examples

# A noisy sine wave example
x <- seq(length=500, from=-2*pi, to=2*pi)
y <- sin(x) + rnorm(500, sd=sqrt(0.075)) # Set the variance of the noise to 0.075.
xy <- data.frame(x,y)                    # Create an input/output dataset.
nest <- gammatest(xy)            # Run Gamma test.

# Zero noise time series data: The Henon Map
data(HenonMap)
dvhm <- dvec(HenonMap[,1], 2)
a <- gammatest(dvhm)

[Package GammaTest version 1.1 Index]