compare, compare.kda.diag.cv, compare.kda.cv {ks}R Documentation

Comparisons for kernel discriminant analysis

Description

Comparisons for kernel discriminant analysis.

Usage

compare(x.group, est.group, by.group=FALSE)
compare.kda.cv(x, x.group, bw="plugin", prior.prob=NULL, Hstart,
    by.group=FALSE, trace=FALSE, binned=FALSE, bgridsize,
    recompute=FALSE, ...)
compare.kda.diag.cv(x, x.group, bw="plugin", prior.prob=NULL,
    by.group=FALSE, trace=FALSE, binned=FALSE, bgridsize,
    recompute=FALSE, ...)

Arguments

x matrix of training data values
x.group vector of group labels for training data
est.group vector of estimated group labels
bw "plugin" = plug-in, "lscv" = LSCV, "scv" = SCV
Hstart (stacked) matrix of initial bandwidth matrices
prior.prob vector of prior probabilities
by.group flag to give results also within each group
trace flag for printing messages in command line to trace the execution
binned flag for binned kernel estimation
bgridsize vector of binning grid sizes - only required if binned=TRUE
recompute flag for recomputing the bandwidth matrix after excluding the i-th data item
... other optional parameters for bandwidth selection, see Hpi, Hlscv, Hscv

Details

If you have prior probabilities then set prior.prob to these. Otherwise prior.prob=NULL is the default i.e. use the sample proportions as estimates of the prior probabilities.

If trace=TRUE, a message is printed in the command line indicating that it's processing the i-th data item: cross-validated estimates may take a long time to execute.

Value

The functions create a comparison between the true group labels x.group and the estimated ones. It returns a list with fields

cross cross-classification table with the rows indicating the true group and the columns the estimated group
error misclassification rate (MR)


In the case where we have test data that is independent of the training data, compare computes

MR = (number of points wrongly classified) / (total number of points).


In the case where we don't have independent test data e.g. we are classifying the training data set itself, then the cross validated estimate of MR is more appropriate. See Silverman (1986). These are implemented as compare.kda.cv (full bandwidth selectors) and compare.kda.diag.cv (for diagonal bandwidth selectors). These functions are only available for d > 1.
If by.group=FALSE then only the total MR rate is given. If it is set to TRUE, then the MR rates for each class are also given (estimated number in group divided by true number).

References

Silverman, B. W. (1986) Data Analysis for Statistics and Data Analysis. Chapman & Hall. London.

Simonoff, J. S. (1996) Smoothing Methods in Statistics. Springer-Verlag. New York

Venables, W.N. & Ripley, B.D. (1997) Modern Applied Statistics with S-PLUS. Springer-Verlag. New York.

See Also

kda.kde

Examples

### univariate example -- independent test data
x <- c(rnorm.mixt(n=100, mus=1, sigmas=1, props=1),
       rnorm.mixt(n=100, mus=-1, sigmas=1, props=1))
x.gr <- rep(c(1,2), times=c(100,100))
y <- c(rnorm.mixt(n=100, mus=1, sigmas=1, props=1),
       rnorm.mixt(n=100, mus=-1, sigmas=1, props=1))
kda.gr <- kda(x, x.gr, hs=sqrt(c(0.09, 0.09)), y=y)
compare(x.gr, kda.gr)
compare(x.gr, kda.gr, by.group=TRUE) 

### bivariate example - restricted iris dataset, dependent test data
library(MASS)
data(iris)
ir <- iris[,c(1,2)]
ir.gr <- iris[,5]
compare.kda.cv(ir, ir.gr, bw="plug-in", pilot="samse")

[Package ks version 1.6.2 Index]