kernel.pls.ic {plsdof}R Documentation

Model selection for Kernel Partial Least Squares based on information criteria

Description

This function computes the optimal model parameters using three different model selection criteria (aic, bic, gmdl) and based on two different Degrees of Freedom estimates for PLS.

Usage

kernel.pls.ic(X, y, m = ncol(X), type = "vanilla", sigma = 1, step.size = 1)

Arguments

X matrix of predictor observations.
y vector of response observations. The length of y is the same as the number of rows of X.
m maximal number of Partial Least Squares components. Default is m=ncol(X).
type type of kernel. type="vanilla" is a linear kernel. type="gaussian" is a gaussian kernel. Default is type="vanilla".
sigma vector of kernel parameters. If type="gaussian", these are the kernel widths. If the vanilla kernel is used, sigma is not used. Default value is sigma=1.
step.size After how many steps should the latent components be re-orthogonalized? See kernel.pls.fit for more details. Default is step.size=1.

Details

For the linear kernel (type="vanilla"), we standardize X to zero mean and unit variance. For the Gaussian kernel (type="gaussian"), we normalize X such that the range of each column is [-1,1].

The default value for sigma is in general NOT a sensible parameter, and sigma should always be selected from a RANGE of values. The default value for m is a sensible upper bound only for the vanilla kernel.

Value

DoF Degrees of Freedom
m.aic optimal number of components for aic
m.bic optimal number of components for bic
m.gmdl optimal number of components for gmdl
m.aic.naive optimal number of components for aic and the naive Degrees of Freedom
m.bic.naive optimal number of components for bic and the naive Degrees of Freedom
m.gmdl.naive optimal number of components for gmdl and the naive Degrees of Freedom
sigma.aic optimal sigma for aic, only returned if type="gaussian"
sigma.bic optimal sigma for bic, only returned if type="gaussian"
sigma.gmdl optimal sigma for gmdl, only returned if type="gaussian"
sigma.aic.naive optimal sigma for aic and the naive Degrees of Freedom, only returned if type="gaussian"
sigma.bic.naive optimal sigma for bic and the naive Degrees of Freedom, only returned if type="gaussian"
sigma.gmdl.naive optimal sigma for gmdl and the naive Degrees of Freedom, only returned if type="gaussian"

Author(s)

Nicole Kraemer, Mikio L. Braun

References

Akaikie, H. (1973) "Information Theory and an Extension of the Maximum Likelihood Principle". Second International Symposium on Information Theory, 267 - 281.

Hansen, M., Yu, B. (2001). "Model Selection and Minimum Descripion Length Principle". Journal of the American Statistical Association, 96, 746 - 774

Kraemer, N., Braun, M.L. (2007) "Kernelizing PLS, Degrees of Freedom, and Efficient Model Selection", Proceedings of the 24th International Conference on Machine Learning, Omni Press, 441 - 448

Schwartz, G. (1979) "Estimating the Dimension of a Model" Annals of Statistics 26(5), 1651 - 1686.

See Also

kernel.pls, kernel.pls.cv

Examples

n<-50 # number of observations
p<-5 # number of variables
X<-matrix(rnorm(n*p),ncol=p)
y<-rnorm(n)

# compute linear PLS
linear.pls<-kernel.pls.ic(X,y,m=ncol(X))

[Package plsdof version 0.1-1 Index]