ssanova {gss}R Documentation

Fitting Smoothing Spline ANOVA Models

Description

Fit smoothing spline ANOVA models in Gaussian regression. The symbolic model specification via formula follows the same rules as in lm.

Usage

ssanova(formula, type=NULL, data=list(), weights, subset, offset,
        na.action=na.omit, partial=NULL, method="v", alpha=1.4,
        varht=1, id.basis=NULL, nbasis=NULL, seed=NULL, random=NULL)

Arguments

formula Symbolic description of the model to be fit.
type List specifying the type of spline for each variable. See mkterm for details.
data Optional data frame containing the variables in the model.
weights Optional vector of weights to be used in the fitting process.
subset Optional vector specifying a subset of observations to be used in the fitting process.
offset Optional offset term with known parameter 1.
na.action Function which indicates what should happen when the data contain NAs.
partial Optional extra unpenalized terms in partial spline models.
method Method for smoothing parameter selection. Supported are method="v" for GCV, method="m" for GML (REML), and method="u" for Mallows' CL.
alpha Parameter modifying GCV or Mallows' CL; larger absolute values yield smoother fits; negative value invokes a stable and more accurate GCV/CL evaluation algorithm but may take two to five times as long. Ignored when method="m" are specified.
varht External variance estimate needed for method="u". Ignored when method="v" or method="m" are specified.
id.basis Index designating selected "knots".
nbasis Number of "knots" to be selected. Ignored when id.basis is supplied.
seed Seed to be used for the random generation of "knots". Ignored when id.basis is supplied.
random Input for parametric random effects in nonparametric mixed-effect models. See mkran for details.

Details

The model specification via formula is intuitive. For example, y~x1*x2 yields a model of the form

y = C + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e

with the terms denoted by "1", "x1", "x2", and "x1:x2".

The model terms are sums of unpenalized and penalized terms. Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.

A subset of the observations are selected as "knots." Unless specified via id.basis or nbasis, the number of "knots" q is determined by max(30,10n^{2/9}), which is appropriate for the default cubic splines for numerical vectors.

Using q "knots," ssanova calculates an approximate solution to the penalized least squares problem using algorithms of the order O(nq^{2}), which for q<<n scale better than the O(n^{3}) algorithms of ssanova0. For the exact solution, one may set q=n in ssanova, but ssanova0 would be much faster.

Value

ssanova returns a list object of class "ssanova".
The method summary.ssanova can be used to obtain summaries of the fits. The method predict.ssanova can be used to evaluate the fits at arbitrary points along with standard errors. The method project.ssanova can be used to calculate the Kullback-Leibler projection for model selection. The methods residuals.ssanova and fitted.ssanova extract the respective traits from the fits.

Note

To use GCV and Mallows' CL unmodified, set alpha=1.

For simpler models and moderate sample sizes, the exact solution of ssanova0 can be faster.

The results may vary from run to run. For consistency, specify id.basis or set seed.

In gss versions earlier than 1.0, ssanova was under the name ssanova1.

Author(s)

Chong Gu, chong@stat.purdue.edu

References

Gu, C. (2002), Smoothing Spline ANOVA Models. New York: Springer-Verlag.

Kim, Y.-J. and Gu, C. (2004), Smoothing spline Gaussian regression: more scalable computation via efficient approximation. Journal of the Royal Statistical Society, Ser. B, 66, 337–356.

Wahba, G. (1990), Spline Models for Observational Data. Philadelphia: SIAM.

Examples

## Fit a cubic spline
x <- runif(100); y <- 5 + 3*sin(2*pi*x) + rnorm(x)
cubic.fit <- ssanova(y~x)
## Obtain estimates and standard errors on a grid
new <- data.frame(x=seq(min(x),max(x),len=50))
est <- predict(cubic.fit,new,se=TRUE)
## Plot the fit and the Bayesian confidence intervals
plot(x,y,col=1); lines(new$x,est$fit,col=2)
lines(new$x,est$fit+1.96*est$se,col=3)
lines(new$x,est$fit-1.96*est$se,col=3)
## Clean up
## Not run: 
rm(x,y,cubic.fit,new,est)
dev.off()
## End(Not run)

## Fit a tensor product cubic spline
data(nox)
nox.fit <- ssanova(log10(nox)~comp*equi,data=nox)
## Fit a spline with cubic and nominal marginals
nox$comp<-as.factor(nox$comp)
nox.fit.n <- ssanova(log10(nox)~comp*equi,data=nox)
## Fit a spline with cubic and ordinal marginals
nox$comp<-as.ordered(nox$comp)
nox.fit.o <- ssanova(log10(nox)~comp*equi,data=nox)
## Clean up
## Not run: rm(nox,nox.fit,nox.fit.n,nox.fit.o)

[Package gss version 1.0-5 Index]