sshzd {gss} | R Documentation |
Estimate hazard function using smoothing spline ANOVA models
with cubic spline, linear spline, or thin-plate spline marginals for
numerical variables. The symbolic model specification via
formula
follows the same rules as in lm
, but
with the response of a special form.
sshzd(formula, type="cubic", data=list(), alpha=1.4, weights=NULL, subset, na.action=na.omit, id.basis=NULL, nbasis=NULL, seed=NULL, ext=.05, order=2, prec=1e-7, maxiter=30)
formula |
Symbolic description of the model to be fit. Details are given below. |
type |
Type of numerical marginals to be used. Supported are
type="cubic" for cubic spline marginals,
type="linear" for linear spline marginals, and
type="tp" for thin-plate spline marginals. |
data |
Optional data frame containing the variables in the model. |
alpha |
Parameter defining cross-validation score for smoothing parameter selection. |
weights |
Optional vector of bin-counts for histogram data. |
subset |
Optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
Function which indicates what should happen when the data contain NAs. |
id.basis |
Index of observations to be used as "knots." |
nbasis |
Number of "knots" to be used. Ignored when
id.basis is specified. |
seed |
Seed to be used for the random generation of "knots."
Ignored when id.basis is specified. |
ext |
For cubic spline and linear spline marginals, this option
specifies how far to extend the domain beyond the minimum and
the maximum as a percentage of the range. The default
ext=.05 specifies marginal domains of lengths 110 percent
of their respective ranges. Evaluation outside of the domain
will result in an error. Ignored if type="tp" or
domain are specified. |
order |
For thin-plate spline marginals, this option specifies
the order of the marginal penalties. Ignored if
type="cubic" or type="linear" are specified. |
prec |
Precision requirement for internal iterations. |
maxiter |
Maximum number of iterations allowed for internal iterations. |
The model specification via formula
is for the log hazard.
For example, ~x1*x2
prescribes a model of the form
log f(x1,x2) = C + g_{1}(x1) + g_{2}(x2) + g_{12}(x1,x2)
with the terms denoted by "x1"
, "x2"
, and
"x1:x2"
.
sshzd
takes standard right-censored lifetime data, with
possible left-truncation and covariates. The response in
formula
must be of the form
Surv(futime,status,start=0)
, where futime
is the
follow-up time, status
is the censoring indicator, and
start
is the optional left-truncation time. The function
Surv
is defined and parsed inside sshzd
, not quite the
same as the one in the survival
package.
The main effect of futime
must appear in the model terms.
The absence of interactions between futime
and covariates
characterizes proportional hazard models.
Parallel to those in a ssanova
object, the model terms
are sums of unpenalized and penalized terms. Attached to every
penalized term there is a smoothing parameter, and the model
complexity is largely determined by the number of smoothing
parameters.
The selection of smoothing parameters is through a cross-validation
mechanism described in the references, with a parameter
alpha
; alpha=1
is "unbiased" for the minimization of
Kullback-Leibler loss but may yield severe undersmoothing, whereas
larger alpha
yields smoother estimates.
A subset of the observations are selected as "knots." Unless
specified via id.basis
or nbasis
, the subset size is
determined by max(30,10n^(2/9)), which is appropriate for
type="cubic"
but not necessarily for type="linear"
or
type="tp"
.
sshzd
returns a list object of class "sshzd"
.
hzdrate.sshzd
can be used to evaluate the estimated
hazard function. hzdcurve.sshzd
can be used to
evaluate hazard curves with fixed covariates.
survexp.sshzd
can be used to calculated estimated
expected survival.
Integration on the time axis is done by the 200-point Gauss-Legendre formula on [0,T], where T is the largest follow-up time.
Chong Gu, chong@stat.purdue.edu
Gu, C. (2002), Smoothing Spline ANOVA Models. New York: Springer-Verlag.
Gu, C. and Wang, J. (2003), Penalized likelihood density estimation: Direct cross-validation and scalable approximation
hzdrate.sshzd
, hzdcurve.sshzd
, and
survexp.sshzd
.
## Model with interaction data(gastric) gastric.fit <- sshzd(Surv(futime,status)~futime*trt,data=gastric) ## exp(-Lambda(600)), exp(-(Lambda(1200)-Lambda(600))), and exp(-Lambda(1200)) survexp.sshzd(gastric.fit,c(600,1200,1200),data.frame(trt=as.factor(1)),c(0,600,0)) ## Clean up ## Not run: rm(gastric,gastric.fit) dev.off() ## End(Not run) ## THE FOLLOWING EXAMPLE IS TIME-CONSUMING ## Proportional hazard model ## Not run: data(stan) stan.fit <- sshzd(Surv(futime,status)~futime+age,data=stan) ## Evaluate fitted hazard hzdrate.sshzd(stan.fit,data.frame(futime=c(10,20),age=c(20,30))) ## Plot lambda(t,age=20) tt <- seq(0,60,leng=101) hh <- hzdcurve.sshzd(stan.fit,tt,data.frame(age=20)) plot(tt,hh,type="l") ## Clean up rm(stan,stan.fit,tt,hh) dev.off() ## End(Not run)