sshzd {gss} | R Documentation |
Estimate hazard function using smoothing spline ANOVA models. The
symbolic model specification via formula
follows the same
rules as in lm
, but with the response of a special
form.
sshzd(formula, type=NULL, data=list(), alpha=1.4, weights=NULL, subset, na.action=na.omit, id.basis=NULL, nbasis=NULL, seed=NULL, prec=1e-7, maxiter=30)
formula |
Symbolic description of the model to be fit, where
the response is of the form Surv(futime,status,start=0) . |
type |
List specifying the type of spline for each variable.
See mkterm for details. |
data |
Optional data frame containing the variables in the model. |
alpha |
Parameter defining cross-validation score for smoothing parameter selection. |
weights |
Optional vector of counts for duplicated data. |
subset |
Optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
Function which indicates what should happen when the data contain NAs. |
id.basis |
Index of observations to be used as "knots." |
nbasis |
Number of "knots" to be used. Ignored when
id.basis is specified. |
seed |
Seed to be used for the random generation of "knots."
Ignored when id.basis is specified. |
prec |
Precision requirement for internal iterations. |
maxiter |
Maximum number of iterations allowed for internal iterations. |
The model specification via formula
is for the log hazard.
For example, Suve(t,d)~t*u
prescribes a model of the form
log f(t,u) = C + g_{t}(t) + g_{u}(u) + g_{t,u}(t,u)
with the terms denoted by "1"
, "t"
, "u"
, and
"t:u"
. Replacing t*u
by t+u
in the
formula
, one gets a proportional hazard model with
g_{t,u}=0.
sshzd
takes standard right-censored lifetime data, with
possible left-truncation and covariates; in
Surv(futime,status,start=0)~...
, futime
is the
follow-up time, status
is the censoring indicator, and
start
is the optional left-truncation time. The main effect
of futime
must appear in the model terms specified via
...
.
Parallel to those in a ssanova
object, the model terms
are sums of unpenalized and penalized terms. Attached to every
penalized term there is a smoothing parameter, and the model
complexity is largely determined by the number of smoothing
parameters.
The selection of smoothing parameters is through a cross-validation
mechanism described in Gu (2002, Sec. 7.2), with a parameter
alpha
; alpha=1
is "unbiased" for the minimization of
Kullback-Leibler loss but may yield severe undersmoothing, whereas
larger alpha
yields smoother estimates.
A subset of the observations are selected as "knots." Unless
specified via id.basis
or nbasis
, the number of
"knots" q is determined by max(30,10n^{2/9}), which is
appropriate for the default cubic splines for numerical vectors.
sshzd
returns a list object of class "sshzd"
.
hzdrate.sshzd
can be used to evaluate the estimated
hazard function. hzdcurve.sshzd
can be used to
evaluate hazard curves with fixed covariates.
survexp.sshzd
can be used to calculated estimated
expected survival. The method project.sshzd
can be
used to calculate the Kullback-Leibler projection for model
selection.
The function Surv(futime,status,start=0)
is defined and
parsed inside sshzd
, not quite the same as the one in the
survival
package.
Integration on the time axis is done by the 200-point Gauss-Legendre
formula on c(min(start),max(futime))
, returned from
gauss.quad
.
The results may vary from run to run. For consistency, specify
id.basis
or set seed
.
Chong Gu, chong@stat.purdue.edu
Gu, C. (2002), Smoothing Spline ANOVA Models. New York: Springer-Verlag.
Du, P. and Gu, C. (2006), Penalized likelihood hazard estimation: efficient approximation and Bayesian confidence intervals. Statistics and Probability Letters, 76, 244–254.
## Model with interaction data(gastric) gastric.fit <- sshzd(Surv(futime,status)~futime*trt,data=gastric) ## exp(-Lambda(600)), exp(-(Lambda(1200)-Lambda(600))), and exp(-Lambda(1200)) survexp.sshzd(gastric.fit,c(600,1200,1200),data.frame(trt=as.factor(1)),c(0,600,0)) ## Clean up ## Not run: rm(gastric,gastric.fit) dev.off() ## End(Not run) ## THE FOLLOWING EXAMPLE IS TIME-CONSUMING ## Proportional hazard model ## Not run: data(stan) stan.fit <- sshzd(Surv(futime,status)~futime+age,data=stan) ## Evaluate fitted hazard hzdrate.sshzd(stan.fit,data.frame(futime=c(10,20),age=c(20,30))) ## Plot lambda(t,age=20) tt <- seq(0,60,leng=101) hh <- hzdcurve.sshzd(stan.fit,tt,data.frame(age=20)) plot(tt,hh,type="l") ## Clean up rm(stan,stan.fit,tt,hh) dev.off() ## End(Not run)