sim.data.ppls {ppls} | R Documentation |
This function generates data that can be used for simulations.
sim.data.ppls(ntrain,ntest,stnr,p,a=NULL,b=NULL)
ntrain |
The number of training observations. |
ntest |
The number of test observations. |
stnr |
The signal to noise ratio. |
p |
The number of predictor variables. |
a |
A vector of length 5 that determines the regression problem to be simulated. |
b |
A vector of length 5 that determines the regression problem to be simulated. |
The matrix of training and test data are drawn from a uniform
distribution over [-1,1] for each of the p
variables. The response is
generated via a nonlinear regression model of the form
Y=sum _{j=1} ^5 f_j(X_j) + varepsilon
where f_j(x)=a_j x + sin(6 b_jx). The values of a_j and
b_j can be specified via a
or b
. If no values
for a
or b
is given, they are drawn randomly from
[-1,1]. The variance of the noise term is chosen such that the
signal-to-noise-ratio equals stnr
on the training data.
Xtrain |
A matrix of size ntrain x p . |
ytrain |
A vector of length ntrain . |
Xtest |
A matrix of size ntest x p . |
ytest |
A vector of length ntest . |
sigma |
The standard deviation of the noise term. |
a |
The vector determining the nonlinear function. |
b |
The vector determining the nonlinear function. |
Nicole Kraemer
N. Kraemer, A.-L. Boulesteix, G. Tutz (2007) "Penalized Partial Least Squares with Applications to B-Splines Transformations and Functional Data", preprint
available at http://ml.cs.tu-berlin.de/~nkraemer/publications.html
dummy<-sim.data.ppls(ntrain=50,ntest=200,p=16,stnr=16)