zigam {COZIGAM} | R Documentation |
Fit a Zero-Inflated Generalized Additive Model (ZIGAM) to data.
zigam (formula, maxiter = 20, conv.crit = 1e-3, size = NULL, log.tran = FALSE, family, data=list(), ...)
formula |
A GAM formula. This is exactly like the formula for a GLM except that smooth terms
can be added to the right hand side of the formula (and a formula of the form y ~ . is not allowed).
Smooth terms are specified by expressions of the form: s(var1,var2,...,k=12,fx=FALSE,bs="tp",by=a.var)
where var1 , var2 , etc. are the covariates which the smooth is a function of and
k is the dimension of the basis used to represent the smooth term. by can be used to
specify a variable by which the smooth should be multiplied. |
maxiter |
The maximum number of iterations allowed in the EM algorithm in estimating discrete ZIGAMs. |
conv.crit |
Convergence criterion in the iterative estimation algorithm. |
size |
Number of trials. Must be specified when family is binomial . |
log.tran |
Logical. TRUE if log-transformation is needed for the response. |
family |
This is a family object specifying the distribution and link to use in fitting etc.
See glm and family for more details. Currently support Gaussian/lognormal, Gamma,
Poisson and binomial distributions. |
data |
A data frame or list containing the model response variable and covariates required by the formula. |
... |
Additional arguments to be passed to the low level regression fitting functions. |
A Zero-Inflated Generalized Additive Model (ZIGAM) assumes the response variable Y_i is distributed from a zero-inflated 1-parameter exponential family with covariate x_i (could be high dimensional). More specifically, Y_i comes from a non-zero-inflated exponential family distribution f(x_i) (regular component) with probability p_i and equals zero with probability 1-p_i. The probability of non-zero-inflation p_i also depends on the covariates through some unknown smooth funtions. Different from the COnstrained Zero-Inflated Generalized Additive Model (COZIGAM), the process of generating the non-zero-inflated responses and the zero-inflation process are assumed to be independent. The mean of the non-zero-inflated exponential family distribution is assumed to be
g(μ_i) = s_1(x_i),
and the non-zero-inflation probability is linked to the covariates by
logit(p_i) = s_2(x_i),
where s_1 and s_2 are two possibly distinct smooth functions to be estimated nonparametrically by Generalized Additive Models. See Liu and Chan (2008) for more detail.
A list containing the following components:
fit.gam |
A fitted GAM of the regular component, i.e., non-zero-inflated exponential family regression model. |
fit.lr |
A logistic regression model on the zero-inflation process. |
V.beta |
The covariance matrix of the estimated parameters associated with the smooth functions in the non-zero-inflated data generating process. |
V.gamma |
The covariance matrix of the estimated parameters associated with the smooth functions in the zero-inflation process. |
mu |
The fitted regular mean values. |
dispersion |
(Estimated) dispersion parameter. |
formula |
Model formula. |
p |
The fitted non-zero-inflation probabilities. |
psi |
Conditional expectation of the zero-inflation indicator (only for the discrete case which involves EM algorithm). |
family |
The family used. |
loglik, ploglik |
The (penalized) log-likelihood of the fitted model. |
logE |
Approximated logarithmic marginal likelihood by Laplace method used for model selection. |
X1 |
Design matrix in the non-zero-inflated exponential family regression model. |
X2 |
Design matrix in the logistic regression model. |
Hai Liu and Kung-Sik Chan
Liu, H and Chan, K.S. (2008) Constrained Generalized Additive Model with Zero-Inflated Data. Technical Report 388, Department of Statistics and Actuarial Science, The University of Iowa. http://www.stat.uiowa.edu/techrep/tr388.pdf
## Gaussian Response set.seed(11) n <- 200 x1 <- runif(n, 0, 1) f <- (f0(x1)-mean(f0(x1)))/2 sig <- 0.3 mu0 <- f + 1.5 y <- mu0 + rnorm(n, 0, sig) eta.p0 <- f1(x1)*2 - 1 # true function used in zero-infation process p0 <- .Call("logit_linkinv", eta.p0, PACKAGE = "stats") z <- rbinom(rep(1,n), 1, p0) y[z==0] <- 0 # Fit a ZIGAM res.un <- zigam(y~s(x1), family=gaussian) # Compare with a COZIGAM res <- cozigam(y~s(x1), family=gaussian) res.un$logE > res$logE ## Poisson Response set.seed(11) n <- 400 x1 <- runif(n, 0, 1) x2 <- runif(n, 0, 1) eta0 <- test(x1,x2)*3 mu0 <- exp(eta0) alpha0 <- -0.2 delta0 <- 1.0 p0 <- .Call("logit_linkinv", alpha0 + delta0 * eta0, PACKAGE = "stats") z <- rbinom(rep(1,n), 1, p0) y <- rpois(rep(1,n), mu0) y[z==0] <- 0 res.un <- zigam(y~s(x1,x2), maxiter=30, family=poisson) # fit a ZIGAM res <- cozigam(y~s(x1,x2), maxiter=30, family=poisson) # fit a COZIGAM res.un$logE < res$logE # compare the model selction criterion