sim.data.ppls {ppls} | R Documentation |
generates data that can be used for simulations
sim.data.ppls(ntrain,ntest,stnr,p,a=NULL,b=NULL)
ntrain |
number of training observations |
ntest |
number of test observations |
stnr |
signal to noise ratio |
p |
number of predictor variables |
a |
vector of length 5 that determines the regression problem to be simulated |
b |
vector of length 5 that determines the regression problem to be simulated |
The matrix of training and test data is drawn from a uniform
distribution over [-1,1] for each of the p
variables. The response is
generated via a nonlinear regression model of the form
Y=sum _{j=1} ^5 f_j(X_j) + varepsilon
where f_j(x)=a_j x + sin(6 b_jx). The values of a_j and
b_j can be specified via a
or b
. If no values
for a
or b
is given, they are drawn randomly from
[-1,1]. The variance of the noise term is chosen such that the
signal-to-noise-ratio equals stnr
on the training data.
Xtrain |
matrix of size ntrain x p |
ytrain |
vector of lengt ntrain |
Xtest |
matrix of size ntest x p |
ytest |
vector of lengt ntest |
sigma |
standard deviation of the noise term |
a |
vector that determines the nonlinear function |
b |
vector that determines the nonlinear function |
Nicole Kr"amer
N. Kr"amer, A.-L. Boulsteix, and G. Tutz (2008). Penalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data. Chemometrics and Intelligent Laboratory Systems, 94, 60 - 69.
dummy<-sim.data.ppls(ntrain=50,ntest=200,p=16,stnr=16)