zigam {COZIGAM}R Documentation

Fitting (Unconstrained) Zero-Inflated Generalized Additive Models

Description

Fit a Zero-Inflated Generalized Additive Model (ZIGAM) to data.

Usage

zigam (formula, maxiter = 20, conv.crit = 1e-3,
    size = NULL, log.tran = FALSE, family, data=list(), ...)

Arguments

formula A GAM formula. This is exactly like the formula for a GLM except that smooth terms can be added to the right hand side of the formula (and a formula of the form y ~ . is not allowed). Smooth terms are specified by expressions of the form: s(var1,var2,...,k=12,fx=FALSE,bs="tp",by=a.var) where var1, var2, etc. are the covariates which the smooth is a function of and k is the dimension of the basis used to represent the smooth term. by can be used to specify a variable by which the smooth should be multiplied.
maxiter The maximum number of iterations allowed in the EM algorithm in estimating discrete ZIGAMs.
conv.crit Convergence criterion in the iterative estimation algorithm.
size Number of trials. Must be specified when family is binomial.
log.tran Logical. TRUE if log-transformation is needed for the response.
family This is a family object specifying the distribution and link to use in fitting etc. See glm and family for more details. Currently support Gaussian/lognormal, Gamma, Poisson and binomial distributions.
data A data frame or list containing the model response variable and covariates required by the formula.
... Additional arguments to be passed to the low level regression fitting functions.

Details

A Zero-Inflated Generalized Additive Model (ZIGAM) assumes the response variable Y_i is distributed from a zero-inflated 1-parameter exponential family with covariate x_i (could be high dimensional). More specifically, Y_i comes from a non-zero-inflated exponential family distribution f(x_i) (regular component) with probability p_i and equals zero with probability 1-p_i. The probability of non-zero-inflation p_i also depends on the covariates through some unknown smooth funtions. Different from the COnstrained Zero-Inflated Generalized Additive Model (COZIGAM), the process of generating the non-zero-inflated responses and the zero-inflation process are assumed to be independent. The mean of the non-zero-inflated exponential family distribution is assumed to be

g(μ_i) = s_1(x_i),

and the non-zero-inflation probability is linked to the covariates by

logit(p_i) = s_2(x_i),

where s_1 and s_2 are two possibly distinct smooth functions to be estimated nonparametrically by Generalized Additive Models. See Liu and Chan (2008) for more detail.

Value

A list containing the following components:

fit.gam A fitted GAM of the regular component, i.e., non-zero-inflated exponential family regression model.
fit.lr A logistic regression model on the zero-inflation process.
V.beta The covariance matrix of the estimated parameters associated with the smooth functions in the non-zero-inflated data generating process.
V.gamma The covariance matrix of the estimated parameters associated with the smooth functions in the zero-inflation process.
mu The fitted regular mean values.
dispersion (Estimated) dispersion parameter.
formula Model formula.
p The fitted non-zero-inflation probabilities.
psi Conditional expectation of the zero-inflation indicator (only for the discrete case which involves EM algorithm).
family The family used.
loglik, ploglik The (penalized) log-likelihood of the fitted model.
logE Approximated logarithmic marginal likelihood by Laplace method used for model selection.
X1 Design matrix in the non-zero-inflated exponential family regression model.
X2 Design matrix in the logistic regression model.

Author(s)

Hai Liu and Kung-Sik Chan

References

Liu, H and Chan, K.S. (2008) Constrained Generalized Additive Model with Zero-Inflated Data. Technical Report 388, Department of Statistics and Actuarial Science, The University of Iowa. http://www.stat.uiowa.edu/techrep/tr388.pdf

See Also

cozigam

Examples

## Gaussian Response 
set.seed(11)
n <- 200
x1 <- runif(n, 0, 1)
f <- (f0(x1)-mean(f0(x1)))/2
sig <- 0.3
mu0 <- f + 1.5
y <- mu0 + rnorm(n, 0, sig)

eta.p0 <- f1(x1)*2 - 1 # true function used in zero-infation process
p0 <- .Call("logit_linkinv", eta.p0, PACKAGE = "stats")

z <- rbinom(rep(1,n), 1, p0)
y[z==0] <- 0

# Fit a ZIGAM
res.un <- zigam(y~s(x1), family=gaussian)

# Compare with a COZIGAM
res <- cozigam(y~s(x1), family=gaussian)
res.un$logE > res$logE

## Poisson Response
set.seed(11)
n <- 400
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1)

eta0 <- test(x1,x2)*3
mu0 <- exp(eta0)  

alpha0 <- -0.2
delta0 <- 1.0
p0 <- .Call("logit_linkinv", alpha0 + delta0 * eta0, PACKAGE = "stats")

z <- rbinom(rep(1,n), 1, p0)
y <- rpois(rep(1,n), mu0)
y[z==0] <- 0

res.un <- zigam(y~s(x1,x2), maxiter=30, family=poisson) # fit a ZIGAM
res <- cozigam(y~s(x1,x2), maxiter=30, family=poisson) # fit a COZIGAM
res.un$logE < res$logE # compare the model selction criterion


[Package COZIGAM version 2.0-1 Index]