cozigam {COZIGAM}R Documentation

Fitting Constrained Zero-Inflated Generalized Additive Models

Description

Fits a COnstrained Zero-Inflated Generalized Additive Model (COZIGAM) to data.

Usage

cozigam (formula, maxiter = 30, conv.crit.in = 1e-5, 
    conv.crit.out = 1e-4, size = NULL, log.tran = FALSE, family, ...)

Arguments

formula a GAM formula. This is exactly like the formula for a GLM except that smooth terms can be added to the right hand side of the formula (and a formula of the form y ~ . is not allowed). Smooth terms are specified by expressions of the form: s(var1,var2,...,k=12,fx=FALSE,bs="tp",by=a.var) where var1, var2, etc. are the covariates which the smooth is a function of and k is the dimension of the basis used to represent the smooth term. by can be used to specify a variable by which the smooth should be multiplied.
maxiter the maximum number of iterations allowed in the estimation procedure.
conv.crit.in convergence criterion in the inner loop.
conv.crit.out convergence criterion in the outer loop.
size optional. Must be specified when family is binomial.
log.tran logical. TRUE if log-transformation is needed for the response.
family This is a family object specifying the distribution and link to use in fitting etc. See glm and family for more details. Currently support Gaussian/lognormal, Gamma, Poisson and binomial distributions.
... additional arguments to be passed to the low level regression fitting functions.

Details

A COnstrained Zero-Inflated Generalized Additive Model (COZIGAM) assumes the response variable Y_i is distributed from a zero-inflated 1-parameter exponential family with covariate x_i (could be high dimensional). More specifically, Y_i comes from a regular exponential family distribution f(x_i) with probability p_i and equals zero with probability 1-p_i, with the further assumption that the probability of zero-inflation p_i is some monotone function of the mean response function on the link scale, e.g., if a logit link is used, p_i is linearly constrained by:

logit(p_i) = α + delta g(μ_i)

, where g() is the link function of the assumed exponential family and α and delta are two unknown parameters to be estimated. The mean response is estimated nonparametrically by a Generalized Additive Model (GAM), i.e., g(μ_i) = s(x_i) for some smooth function. This bypasses the problems of two popular methods for analyzing zero-inflated data that either focus only on the non-zero data or model the presence-absence data and the non-zero data separately.

The estimation approach is a modified penalized iteratively reweighted least squares algorithm (P-IRLS) which requires the magic() function in the mgcv library. EM algorithm is used if the underlying regular exponential family has strictly positive probability mass at zero. The covariance matrix of the estimated coefficients is obtained by inverting the observed information from Louis' method. See Liu and Chan (2008) for more detail.

Value

An object of class "cozigam" is a list containing the following components:

coefficients a named vector of coefficients including the linear constraint parameters.
V.theta The covariance matrix of the estimated parameters.
beta, V.beta the estimated parameters of smooth terms with associated covariance matrix V.beta.
mu the fitted mean values.
linear.predictor the fitted values on the link scale.
dispersion (estimated) dispersion parameter.
formula model formula.
p the fitted zero-inflation rates.
G an objected returned by gam() containing information used in the estimation procedure.
psi conditional expectation of the zero-inflation indicator (only for the discrete case which involves EM algorithm).
family the family used.
loglik, ploglik the (penalized) log-likelihood of the fitted model.
y the response used.
converged logical. TRUE if the iterative procedure is converged.
est.disp logical. TRUE if the dispersion parameter is estimated.
fit.nonzero fitted GAM using only the nonzero data as in the presence/absence analysis
score the GCV or UBRE score returned by magic() in the iterative procedure.
edf estimated degrees of freedom for each model parameter. Penalization means that many of these are less than 1.
edf.smooth estimated degrees of freedom of each smooth term

Author(s)

Hai Liu and Kung-Sik Chan

References

Liu, H and Chan, K.S. (2008) Constrained Generalized Additive Model with Zero-Inflated Data.

See Also

plot.cozigam,predict.cozigam, summary.cozigam

Examples

## Normal/Log-Normal Response

set.seed(11)
n <- 600
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1)
x3 <- runif(n, 0, 1)

f <- test(x1, x2)*4-mean(test(x1, x2)*4) + f0(x3)/2-mean(f0(x3)/2)
sig <- 0.5
mu0 <- f + 3
y <- mu0 + rnorm(n, 0, sig)

alpha0 <- -2.2
delta0 <- 1.2
p0 <- .Call("logit_linkinv", alpha0 + delta0 * mu0, PACKAGE = "stats")
z <- rbinom(rep(1,n), 1, p0)
y[z==0] <- 0

res <- cozigam(y~s(x1,x2)+s(x3), conv.crit.out = 1e-4, log.tran = FALSE, family = gaussian)

plot(res, too.far = 0.1)

## Poisson Response

n <- 500
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1)
x3 <- runif(n, 0, 1)

f <- test(x1, x2)*2 + f0(x3)/5
eta0 <- f/1.1
mu0 <- exp(eta0)  

alpha0 <- -1
delta0 <- 2
p0 <- .Call("logit_linkinv", alpha0 + delta0 * eta0, PACKAGE = "stats")

z <- rbinom(rep(1,n), 1, p0)
y <- rpois(rep(1,n), mu0)
y[z==0] <- 0

res <- cozigam(y~s(x1,x2)+s(x3), maxiter = 30, conv.crit.out = 1e-3, family = poisson)

## A Tensor Product Smooth Example
test.ten <- function(x,z,sx=0.3,sz=0.4)  
{ x<-x*20
  (pi**sx*sz)*(1.2*exp(-(x-0.2)^2/sx^2-(z-0.3)^2/sz^2)+
  0.8*exp(-(x-0.7)^2/sx^2-(z-0.8)^2/sz^2))
}

set.seed(11)
n <- 400
x1 <- runif(n, 0, 1)/20
x2 <- runif(n, 0, 1)
x3 <- runif(n, 0, 1)

f <- test.ten(x1, x2)*4-mean(test.ten(x1, x2)*4) + f0(x3)/2-mean(f0(x3)/2)
sig <- 0.5
mu0 <- f + 3
y <- mu0 + rnorm(n, 0, sig)

alpha0 <- -2.2
delta0 <- 1.2
p0 <- .Call("logit_linkinv", alpha0 + delta0 * mu0, PACKAGE = "stats")

z <- rbinom(rep(1,n), 1, p0)
y[z==0] <- 0

# If use thin plate spline ...
res.tps <- cozigam(y~s(x1,x2)+s(x3), conv.crit.out = 1e-4, family=gaussian)
par(mfrow=c(1,2))
plot(res.tps, select=1, too.far=0.1)
# Compare with tensor product spline
res.ten <- cozigam(y~te(x1,x2)+s(x3), conv.crit.out = 1e-4, family=gaussian)
plot(res.ten, select=1, too.far=0.1)

[Package COZIGAM version 1.0-3 Index]