ssden {gss} | R Documentation |
Estimate probability densities using smoothing spline ANOVA models
with cubic spline, linear spline, or thin-plate spline marginals for
numerical variables. The symbolic model specification via
formula
follows the same rules as in lm
, but
with the response missing.
ssden(formula, type="cubic", data=list(), alpha=1.4, weights=NULL, subset, na.action=na.omit, id.basis=NULL, nbasis=NULL, seed=NULL, domain=as.list(NULL), quadrature=NULL, ext=.05, order=2, prec=1e-7, maxiter=30)
formula |
Symbolic description of the model to be fit. |
type |
Type of numerical marginals to be used. Supported are
type="cubic" for cubic spline marginals,
type="linear" for linear spline marginals, and
type="tp" for thin-plate spline marginals. |
data |
Optional data frame containing the variables in the model. |
alpha |
Parameter defining cross-validation score for smoothing parameter selection. |
weights |
Optional vector of bin-counts for histogram data. |
subset |
Optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
Function which indicates what should happen when the data contain NAs. |
id.basis |
Index of observations to be used as "knots." |
nbasis |
Number of "knots" to be used. Ignored when
id.basis is specified. |
seed |
Seed to be used for the random generation of "knots."
Ignored when id.basis is specified. |
domain |
Data frame specifying marginal support of density. |
quadrature |
Quadrature for calculating integral. Mandatory
for type="tp" . |
ext |
For cubic spline and linear spline marginals, this option
specifies how far to extend the domain beyond the minimum and
the maximum as a percentage of the range. The default
ext=.05 specifies marginal domains of lengths 110 percent
of their respective ranges. Evaluation outside of the domain
will result in an error. Ignored if type="tp" or
domain are specified. |
order |
For thin-plate spline marginals, this option specifies
the order of the marginal penalties. Ignored if
type="cubic" or type="linear" are specified. |
prec |
Precision requirement for internal iterations. |
maxiter |
Maximum number of iterations allowed for internal iterations. |
The model specification via formula
is for the log density.
For example, ~x1*x2
prescribes a model of the form
log f(x1,x2) = g_{1}(x1) + g_{2}(x2) + g_{12}(x1,x2) + C
with the terms denoted by "x1"
, "x2"
, and
"x1:x2"
; the constant is determined by the fact that a
density integrates to one.
The selective term elimination may characterize (conditional)
independence structures between variables. For example,
~x1*x2+x1*x3
yields the conditional independence of x2 and x3
given x1. Currently, up to four variables are supported.
Parallel to those in a ssanova
object, the model terms
are sums of unpenalized and penalized terms. Attached to every
penalized term there is a smoothing parameter, and the model
complexity is largely determined by the number of smoothing
parameters.
The selection of smoothing parameters is through a cross-validation
mechanism described in the references, with a parameter
alpha
; alpha=1
is "unbiased" for the minimization of
Kullback-Leibler loss but may yield severe undersmoothing, whereas
larger alpha
yields smoother estimates.
A subset of the observations are selected as "knots." Unless
specified via id.basis
or nbasis
, the subset size is
determined by max(30,10n^(2/9)), which is appropriate for
type="cubic"
but not necessarily for type="linear"
or
type="tp"
.
ssden
returns a list object of class "ssden"
.
dssden
and cdssden
can be used to
evaluate the estimated joint density and conditional density.
pssden
, qssden
, cpssden
, and cqssden
can
be used to evaluate (conditional) cdf and quantiles.
For type="cubic"
and type="linear"
, the quadrature
will be generated if not provided by the user. The default
quadrature in 1-D is the 200-point Gauss-Legendre formula on the
domain. The default quadratures on 2-D, 3-D, and 4-D cubes are
selected delayed Smolyak cubatures with 449, 2527, and 13697 points,
on properly scaled product domains. See gauss.quad
and smolyak.quad
.
Chong Gu, chong@stat.purdue.edu
Gu, C. and Wang, J. (2003), Penalized likelihood density estimation: Direct cross-validation and scalable approximation
## 1-D estimate: Buffalo snowfall data(buffalo) buff.fit <- ssden(~buffalo,domain=data.frame(buffalo=c(0,150))) plot(xx<-seq(0,150,len=101),dssden(buff.fit,xx),type="l") plot(xx,pssden(buff.fit,xx),type="l") plot(qq<-seq(0,1,len=51),qssden(buff.fit,qq),type="l") ## Clean up ## Not run: rm(buffalo,buff.fit,xx,qq) dev.off() ## End(Not run) ## 2-D with triangular domain: AIDS incubation data(aids) ## rectangular quadrature quad.pt <- expand.grid(incu=((1:40)-.5)/40*100,infe=((1:40)-.5)/40*100) quad.pt <- quad.pt[quad.pt$incu<=quad.pt$infe,] quad.wt <- rep(1,nrow(quad.pt)) quad.wt[quad.pt$incu==quad.pt$infe] <- .5 quad.wt <- quad.wt/sum(quad.wt)*5e3 ## additive model (pre-truncation independence) aids.fit <- ssden(~incu+infe,data=aids,subset=age>=60, domain=data.frame(incu=c(0,100),infe=c(0,100)), quad=list(pt=quad.pt,wt=quad.wt)) ## conditional (marginal) density of infe jk <- cdssden(aids.fit,xx<-seq(0,100,len=51),data.frame(incu=50)) plot(xx,jk$pdf,type="l") ## conditional (marginal) quantiles of infe (TIME-CONSUMING) ## Not run: cqssden(aids.fit,c(.05,.25,.5,.75,.95),data.frame(incu=50),jk$int) ## End(Not run) ## Clean up ## Not run: rm(aids,quad.pt,quad.wt,aids.fit,jk,xx) dev.off() ## End(Not run)