gssanova0 {gss}R Documentation

Fitting Smoothing Spline ANOVA Models with Non-Gaussian Responses

Description

Fit smoothing spline ANOVA models in non-Gaussian regression. The symbolic model specification via formula follows the same rules as in lm and glm.

Usage

gssanova0(formula, family, type=NULL, data=list(), weights, subset,
          offset, na.action=na.omit, partial=NULL, method=NULL,
          varht=1, nu=NULL, prec=1e-7, maxiter=30)

Arguments

formula Symbolic description of the model to be fit.
family Description of the error distribution. Supported are exponential families "binomial", "poisson", "Gamma", "inverse.gaussian", and "nbinomial". Also supported are accelerated life model families "weibull", "lognorm", and "loglogis".
type List specifying the type of spline for each variable. See mkterm for details.
data Optional data frame containing the variables in the model.
weights Optional vector of weights to be used in the fitting process.
subset Optional vector specifying a subset of observations to be used in the fitting process.
offset Optional offset term with known parameter 1.
na.action Function which indicates what should happen when the data contain NAs.
partial Optional extra unpenalized terms in partial spline models.
method Score used to drive the performance-oriented iteration. Supported are method="v" for GCV, method="m" for GML, and method="u" for Mallow's CL.
varht Dispersion parameter needed for method="u". Ignored when method="v" or method="m" are specified.
nu Inverse scale parameter in accelerated life model families. Ignored for exponential families.
prec Precision requirement for the iterations.
maxiter Maximum number of iterations allowed for performance-oriented iteration, and for inner-loop multiple smoothing parameter selection when applicable.

Details

The model specification via formula is intuitive. For example, y~x1*x2 yields a model of the form

y = C + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e

with the terms denoted by "1", "x1", "x2", and "x1:x2".

The model terms are sums of unpenalized and penalized terms. Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.

Only one link is implemented for each family. It is the logit link for "binomial", and the log link for "poisson", "Gamma", and "inverse.gaussian". For "nbinomial", the working parameter is the logit of the probability p; see NegBinomial. For "weibull", "lognorm", and "loglogis", it is the location parameter for the log lifetime.

The models are fitted by penalized likelihood method through the performance-oriented iteration as described in the reference; the O(n^{3}) algorithms of RKPACK are used for numerical calculations. For family="binomial", "poisson", "nbinomial", "weibull", "lognorm", and "loglogis", the score driving the performance-oriented iteration defaults to method="u" with varht=1. For family="Gamma" and "inverse.gaussian", the default is method="v".

Value

gssanova0 returns a list object of class c("gssanova0","ssanova0","gssanova").
The method summary.gssanova0 can be used to obtain summaries of the fits. The method predict.ssanova0 can be used to evaluate the fits at arbitrary points along with standard errors, on the link scale. The methods residuals.gssanova and fitted.gssanova extract the respective traits from the fits.

Responses

For family="binomial", the response can be specified either as two columns of counts or as a column of sample proportions plus a column of total counts entered through the argument weights, as in glm.

For family="nbinomial", the response may be specified as two columns with the second being the known sizes, or simply as a single column with the common unknown size to be estimated through the maximum likelihood.

For family="weibull", "lognorm", or "loglogis", the response consists of three columns, with the first giving the follow-up time, the second the censoring status, and the third the left-truncation time. For data with no truncation, the third column can be omitted.

Note

The direct cross-validation of gssanova can be more effective in general, and more stable for complex models.

For large sample sizes, the approximate solution of gssanova can be faster.

The method project is not implemented for gssanova0, nor is the mixed-effect model support through mkran.

In gss versions earlier than 1.0, gssanova0 was under the name gssanova.

Author(s)

Chong Gu, chong@stat.purdue.edu

References

Gu, C. (1992), Cross-validating non Gaussian data. Journal of Computational and Graphical Statistics, 1, 169-179.

Examples

## Fit a cubic smoothing spline logistic regression model
test <- function(x)
        {.3*(1e6*(x^11*(1-x)^6)+1e4*(x^3*(1-x)^10))-2}
x <- (0:100)/100
p <- 1-1/(1+exp(test(x)))
y <- rbinom(x,3,p)
logit.fit <- gssanova0(cbind(y,3-y)~x,family="binomial")
## The same fit
logit.fit1 <- gssanova0(y/3~x,"binomial",weights=rep(3,101))
## Obtain estimates and standard errors on a grid
est <- predict(logit.fit,data.frame(x=x),se=TRUE)
## Plot the fit and the Bayesian confidence intervals
plot(x,y/3,ylab="p")
lines(x,p,col=1)
lines(x,1-1/(1+exp(est$fit)),col=2)
lines(x,1-1/(1+exp(est$fit+1.96*est$se)),col=3)
lines(x,1-1/(1+exp(est$fit-1.96*est$se)),col=3)
## Clean up
## Not run: 
rm(test,x,p,y,logit.fit,logit.fit1,est)
dev.off()
## End(Not run)

[Package gss version 1.0-5 Index]