sim.data.ppls {ppls}R Documentation

Simulated Data

Description

This function generates data that can be used for simulations.

Usage

sim.data.ppls(ntrain,ntest,stnr,p,a=NULL,b=NULL)

Arguments

ntrain The number of training observations.
ntest The number of test observations.
stnr The signal to noise ratio.
p The number of predictor variables.
a A vector of length 5 that determines the regression problem to be simulated.
b A vector of length 5 that determines the regression problem to be simulated.

Details

The matrix of training and test data are drawn from a uniform distribution over [-1,1] for each of the p variables. The response is generated via a nonlinear regression model of the form

Y=sum _{j=1} ^5 f_j(X_j) + varepsilon

where f_j(x)=a_j x + sin(6 b_jx). The values of a_j and b_j can be specified via a or b. If no values for a or b is given, they are drawn randomly from [-1,1]. The variance of the noise term is chosen such that the signal-to-noise-ratio equals stnr on the training data.

Value

Xtrain A matrix of size ntrain x p.
ytrain A vector of length ntrain.
Xtest A matrix of size ntest x p.
ytest A vector of length ntest.
sigma The standard deviation of the noise term.
a The vector determining the nonlinear function.
b The vector determining the nonlinear function.

Author(s)

Nicole Kraemer

References

N. Kraemer, A.-L. Boulesteix, G. Tutz (2007) "Penalized Partial Least Squares with Applications to B-Splines Transformations and Functional Data", preprint

available at http://ml.cs.tu-berlin.de/~nkraemer/publications.html

See Also

ppls.splines.cv

Examples

dummy<-sim.data.ppls(ntrain=50,ntest=200,p=16,stnr=16)

[Package ppls version 1.0 Index]