ssanova1 {gss} | R Documentation |
Fit smoothing spline ANOVA models with cubic spline, linear spline,
or thin-plate spline marginals for numerical variables. Factors are
also accepted. The symbolic model specification via formula
follows the same rules as in lm
.
ssanova1(formula, type="cubic", data=list(), weights, subset, offset, na.action=na.omit, partial=NULL, method="v", alpha=1.4, varht=1, id.basis=NULL, nbasis=NULL, seed=NULL, random=NULL, ext=.05, order=2)
formula |
Symbolic description of the model to be fit. |
type |
Type of numerical marginals to be used. Supported are
type="cubic" for cubic spline marginals,
type="linear" for linear spline marginals, and
type="tp" for thin-plate spline marginals. |
data |
Optional data frame containing the variables in the model. |
weights |
Optional vector of weights to be used in the fitting process. |
subset |
Optional vector specifying a subset of observations to be used in the fitting process. |
offset |
Optional offset term with known parameter 1. |
na.action |
Function which indicates what should happen when the data contain NAs. |
partial |
Optional extra unpenalized terms in partial spline models. |
method |
Method for smoothing parameter selection. Supported
are method="v" for GCV, method="m" for GML (REML),
and method="u" for Mallows' CL. |
alpha |
Parameter modifying GCV or Mallows' CL; larger absolute
values yield smoother fits; negative value invokes a stable and
more accurate GCV/CL evaluation algorithm but may take two to
five times as long. Ignored when method="m" are
specified. |
varht |
External variance estimate needed for
method="u" . Ignored when method="v" or
method="m" are specified. |
id.basis |
Index designating selected "knots". |
nbasis |
Number of "knots" to be selected. Ignored when
id.basis is supplied. |
seed |
Seed for reproducible random selection of "knots".
Ignored when id.basis is supplied. |
random |
Input for parametric random effects in nonparametric
mixed-effect models. See mkran for details. |
ext |
For cubic spline and linear spline marginals, this option
specifies how far to extend the domain beyond the minimum and
the maximum as a percentage of the range. The default
ext=.05 specifies marginal domains of lengths 110 percent
of their respective ranges. Prediction outside of the domain
will result in an error. Ignored if type="tp" is
specified. |
order |
For thin-plate spline marginals, this option specifies
the order of the marginal penalties. Ignored if
type="cubic" or type="linear" are specified. |
ssanova1
implements an alternative approach to the fitting of
ssanova
models. It works in some q-dimensional
function space determined by a set of "knots" typically through
random selection; the default dimension (nbasis
) is set to be
q=10n^(2/9), adequate for cubic splines. The algorithms are
of the order O(nq^2), scaling much better than the
O(n^3) of ssanova
; the memory requirements are
O(nq) versus O(n^2).
A simple modification of GCV and Mallows' CL is incorporated in the
code, to help preventing the occasional severe undersmoothing by
these methods. To use the methods unmodified, set alpha=1
.
To "duplicate" ssanova
fits, use ssanova1
with
nbasis
set to the sample size and alpha
set to 1;
numerical discrepancies are to be found, but should be largely
negligible for practical purposes. The ssanova1
"duplicates"
of the ssanova
fits would take much longer to calculate,
however.
The model specification via formula
is intuitive. For
example, y~x1*x2
yields a model of the form
y = c + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e
with the terms denoted by "1"
, "x1"
, "x2"
, and
"x1:x2"
. Through the specifications of the side conditions,
these terms are uniquely defined. In the current implementation,
f_{1} and f_{12} integrate to 0 on the x1
domain for cubic spline and linear spline marginals, and add to
0 over the x1
(marginal) sampling points for thin-plate
spline marginals.
The penalized least squares problem is equivalent to a certain empirical Bayes model or a mixed effect model, and the model terms themselves are generally sums of finer terms of two types, the unpenalized terms (fixed effects) and the penalized terms (random effects). Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.
The method predict
can be used to evaluate the sum of
selected or all model terms at arbitrary points within the domain,
along with standard errors derived from a certain Bayesian
calculation. The method summary
has a flag to request
diagnostics for the practical identifiability and significance of
the model terms.
ssanova1
returns a list object of primary class
"ssanova1"
and secondary class "ssanova"
.
The method summary
is used to obtain summaries of the
fits. The method predict
can be used to evaluate the
fits at arbitrary points, along with the standard errors to be used
in Bayesian confidence intervals. The methods
residuals
and fitted.values
extract the
respective traits from the fits.
Factors are accepted as predictors. When a factor has 3 or more
levels, all terms involving it are penalized, with the "level means"
being shrunk towards each other. The shrinking is done differently
for nominal and ordinal factors; see mkrk.factor
for
details.
The independent variables appearing in formula
can be
multivariate themselves. In particular,
ssanova(y~x,"tp",order=order)
can be used to fit ordinary
thin-plate splines in any dimension, of any order permissible, and
with standard errors available for Bayesian confidence intervals.
Note that thin-plate splines reduce to polynomial splines in one
dimension.
For univariate marginals, the additive models using
type="cubic"
and type="tp"
yield identical fit through
different internal makes. For example,
ssanova(y~x1+x2,"cubic")
and ssanova(y~x1+x2,"tp")
yield the same fit. The same is not true for models with
interactions, however.
Mathematically, the domain (through ext
for
type="cubic"
) or the order (through order
for
type="tp"
) could be specified individually for each of the
variables. Such flexibility is not provided in our implementation,
however, as it would be more a source for confusion than a practical
utility.
Chong Gu, chong@stat.purdue.edu
Kim, Y.-J. and Gu, C. (2002) Penalized Least Squares Regression: Fast Computation via Efficient Approximation. Available at http://stat.purdue.edu/~chong/manu.html.
Gu, C. (2002), Smoothing Spline ANOVA Models. New York: Springer-Verlag.
Wahba, G. (1990), Spline Models for Observational Data. Philadelphia: SIAM.
ssanova
and methods predict.ssanova1
,
summary.ssanova1
, and fitted.ssanova
.
## Fit a cubic spline x <- runif(100); y <- 5 + 3*sin(2*pi*x) + rnorm(x) cubic.fit <- ssanova1(y~x) ## Obtain estimates and standard errors on a grid new <- data.frame(x=seq(min(x),max(x),len=50)) est <- predict(cubic.fit,new,se=TRUE) ## Plot the fit and the Bayesian confidence intervals plot(x,y,col=1); lines(new$x,est$fit,col=2) lines(new$x,est$fit+1.96*est$se,col=3) lines(new$x,est$fit-1.96*est$se,col=3) ## Clean up ## Not run: rm(x,y,cubic.fit,new,est) dev.off() ## End(Not run) ## Fit a tensor product cubic spline data(nox) nox.fit <- ssanova1(log10(nox)~comp*equi,data=nox) ## Fit a spline with cubic and nominal marginals nox$comp<-as.factor(nox$comp) nox.fit.n <- ssanova1(log10(nox)~comp*equi,data=nox) ## Fit a spline with cubic and ordinal marginals nox$comp<-as.ordered(nox$comp) nox.fit.o <- ssanova1(log10(nox)~comp*equi,data=nox) ## Clean up ## Not run: rm(nox,nox.fit,nox.fit.n,nox.fit.o)