ibr {ibr} | R Documentation |
Performs iterative bias reduction using kernel or thin plate
splines. In the latter case, the order m is chosen as the first integer
such that 2m/d>1, where d is the number of explanatory variables.
Missing values are not allowed.
ibr(x, y, criterion="gcv", df=1.5, Kmin=1, Kmax=10000, smoother="k", kernel="g", control.par=list(), cv.options=list())
x |
A numeric matrix of explanatory variables, with n rows and p columns. |
y |
A numeric vector of variable to be explained of length n. |
criterion |
Character string. If the number of iterations
(iter ) is missing or
NULL the number of iterations is chosen using
criterion . The criteria available are GCV (default, "gcv" ),
AIC ("aic" ), corrected AIC ("aicc" ), BIC
("bic" ), gMDL ("gmdl" ), map ("map" ) or rmse
("rmse" ). The last two are designed for cross-validation. |
df |
A numeric vector of either length 1 or length equal to the
number of columns of x . If smoother="k" , it indicates
the desired effective degree of
freedom (trace) of the smoothing matrix for
each variable or for the initial smoother (see contr.sp$dftotal ); df is repeated when the length of vector
df is 1. If smoother="tps" , the minimum df of thin
plate splines is multiplied by df . This argument is useless if
bandwidth is supplied (non null). |
Kmin |
The minimum number of bias correction iterations of the search grid considered by the model selection procedure for selecting the optimal number of iterations. |
Kmax |
The maximum number of bias correction iterations of the search grid considered by the model selection procedure for selecting the optimal number of iterations. |
smoother |
Character string which allows to choose between thin plate
splines "tps" or kernel ("k" ). |
kernel |
Character string which allows to choose between gaussian kernel
("g" ), Epanechnikov ("e" ), uniform ("u" ),
quartic ("q" ). The default (gaussian kernel) is strongly advised. |
control.par |
A named list that control optional parameters. The
components are bandwidth (default to NULL), iter
(default to NULL), really.big (default to FALSE ),
dftobwitmax (default to 1000), exhaustive (default to
FALSE ),m (default to NULL), dftotal (default to
FALSE ), accuracy (default to 0.01), ddlmaxi
(default to 2n/3) and fraction (default to c(100, 200,
500, 1000, 5000, 10^4, 5e+04, 1e+05, 5e+05, 1e+06) ).
bandwidth : a vector of either length 1 or length equal to the
number of columns of x . If smoother="k" ,
it indicates the bandwidth used for
each variable, bandwidth is repeated when the length of vector
bandwidth is 1. If smoother="tps" , it indicates the
amount of penalty (coefficient lambda).
The default (missing) indicates, for smoother="k" , that
bandwidth for each variable is
chosen such that each univariate kernel
smoother (for each explanatory variable) has df effective degrees of
freedom and for smoother="tps" that lambda is chosen such that
the df of the smoothing matrix is df times the minimum df.
iter : the number of iterations. If null or missing, an optimal number of
iterations is chosen from
the search grid (integer from Kmin to Kmax ) to minimize the criterion .
really.big : a boolean: if TRUE it overides the limitation
at 500 observations. Expect long computation times if TRUE .
dftobwitmax : When bandwidth is chosen by specifying the effective
degree
of freedom (see df ) a search is done by
uniroot . This argument specifies the maximum number of iterations transmitted to uniroot function.
exhaustive : boolean, if TRUE an exhaustive search of
optimal number of iteration on the
grid Kmin:Kmax is performed. If FALSE the minimum of
criterion is searched using optimize between Kmin
and Kmax .
m : the order of thin plate splines. This integer m must verifies
2m/d>1, where d is the number of explanatory
variables. The missing default to choose the order m as the first integer
such that 2m/d>1, where d is the number of
explanatory variables (same for NULL ).
dftotal : a boolean wich indicates when FAlSE that the
argument df is the objective df for each univariate kernel (the
default) calculated for each explanatory variable or for the overall
(product) kernel, that is the base smoother (when TRUE ).
accuracy : tolerance when searching bandwidths which lead to a
chosen overall intial df.
dfmaxi : the maximum effective degree of freedom allowed for iterated
biased reduction smoother.
fraction : the subdivision of interval Kmin ,Kmax
if non exhaustive search is performed (see also iterchoiceA or iterchoiceS1 ).
|
cv.options |
A named list which controls the way to do cross
validation with component bwchange ,
ntest , ntrain , Kfold , type ,
seed , method and npermut . bwchange is a boolean (default to FALSE )
which indicates if bandwidth have to be recomputed each
time. ntest is the number of observations in test set and
ntrain is the number of observations in training set. Actually,
only one of these is needed the other can be NULL or missing. Kfold a boolean or an integer. If
Kfold is TRUE then the number of fold is deduced from
ntest (or ntrain ). type is a character string in
random ,timeseries ,consecutive , interleaved
and give the type of segments. seed controls the seed of
random generator. method is either "inmemory" or
"outmemory" ; "inmemory" induces some calculations outside
the loop saving computational time but leading to an increase of the required
memory. npermut is the number of random draws. If
cv.options is list() , then component ntest is set to
floor(nrow(x)/10) , type is random, npermut is 20
and method is "inmemory" , and the other components are NULL |
Returns an object of class ibr
which is a list including:
beta |
Vector of coefficients. |
residuals |
Vector of residuals. |
iter |
The number of iterations used. |
initialdf |
The initial effective degree of freedom of the pilot (or base) smoother. |
finaldf |
The effective degree of freedom of the iterated bias reduction
smoother at the iter iterations. |
bandwidth |
Vector of bandwith for each explanatory variable |
call |
A list containing four components: x contains the
initial explanatory variables, y contains the
initial dependant variables,
criterion contains the chosen criterion, kernel the
kernel, p the number of explanatory variables and m
the order of the splines (if relevant). |
criteria |
either a list containing all the criteria evaluated on the
grid Kmin:Kmax (along with the effective degree of freedom of the
smoother and the sigma squared on this grid) if an exhaustive search is chosen (see the
value of function
iterchoiceAe or iterchoiceS1e ) or the value
of the chosen criterion at the given iteration if a non exhaustive
search is chosen (see exhaustive ). If the number of iterations
iter is given by the user NULL is returned |
Pierre-Andre Cornillon, Nicolas Hengartner and Eric Matzner-Lober.
Cornillon, P. A., Hengartner, N. and Matzner-Lober, E. (2009) Recursive Bias Estimation for high dimensional regression smoothers. submitted.
f <- function(x, y) { .75*exp(-((9*x-2)^2 + (9*y-2)^2)/4) + .75*exp(-((9*x+1)^2/49 + (9*y+1)^2/10)) + .50*exp(-((9*x-7)^2 + (9*y-3)^2)/4) - .20*exp(-((9*x-4)^2 + (9*y-7)^2)) } # define a (fine) x-y grid and calculate the function values on the grid ngrid <- 50; xf <- seq(0,1, length=ngrid+2)[-c(1,ngrid+2)] yf <- xf ; zf <- outer(xf, yf, f) grid <- cbind(rep(xf, ngrid), rep(xf, rep(ngrid, ngrid))) persp(xf, yf, zf, theta=130, phi=20, expand=0.45,main="True Function") #generate a data set with function f and noise to signal ratio 5 noise <- .2 ; N <- 100 xr <- seq(0.05,0.95,by=0.1) ; yr <- xr ; zr <- outer(xr,yr,f) ; set.seed(25) std <- sqrt(noise*var(as.vector(zr))) ; noise <- rnorm(length(zr),0,std) Z <- zr + matrix(noise,sqrt(N),sqrt(N)) # transpose the data to a column format xc <- rep(xr, sqrt(N)) ; yc <- rep(yr, rep(sqrt(N),sqrt(N))) X <- cbind(xc, yc) ; Zc <- as.vector(Z) # fit by thin plate splines (of order 2) ibr res.ibr <- ibr(X,Zc,df=1.1,smoother="tps") fit <- matrix(predict(res.ibr,grid),ngrid,ngrid) persp(xf, yf, fit ,theta=130,phi=20,expand=0.45,main="Fit",zlab="fit") ## Not run: data(ozone, package = "ibr") res.ibr <- ibr(ozone[,-1],ozone[,1],df=1.1) summary(res.ibr) predict(res.ibr) ## End(Not run)