alldist {npmlreg}R Documentation

NPML estimation or Gaussian quadrature for overdispersed GLM's and variance component models

Description

Fits a random effect model using Gaussian quadrature (Hinde, 1982) or nonparametric maximum likelihood (Aitkin, 1996a). The function alldist is designed to account for overdispersion, while allvc fits variance component models.

Usage

alldist(formula, 
        random = ~1, 
        family = gaussian(), 
        data,
        k = 4, 
        random.distribution = "np", 
        tol = 0.5, 
        offset, 
        weights, 
        pluginz, 
        na.action, 
        EMmaxit = 500, 
        EMdev.change = 0.001, 
        lambda = 0, 
        damp = TRUE, 
        damp.power = 1, 
        spike.protect = 0, 
        sdev,
        shape, 
        plot.opt = 3, 
        verbose = TRUE,
        ...)
        
allvc(formula, 
        random = ~1, 
        family = gaussian(), 
        data, 
        k = 4, 
        random.distribution = "np", 
        tol = 0.5, 
        offset, 
        weights, 
        pluginz, 
        na.action, 
        EMmaxit = 500, 
        EMdev.change = 0.001, 
        lambda=0,
        damp = TRUE, 
        damp.power = 1, 
        spike.protect=0,
        sdev,
        shape, 
        plot.opt = 3, 
        verbose = TRUE,
        ...)        

Arguments

formula a formula defining the response and the fixed effects (e.g. y ~ x).
random a formula defining the random model. In the case of alldist, set random = ~1 to model overdispersion, and for instance random = ~x to introcude a random coeffcient x. In the case of allvc, set random=~1|PSU to model overdispersion on the upper level, where PSU is a factor for the primary sampling units, e.g. groups, clusters, classes, or individuals in longitudinal data, and define random coefficents accordingly.
family conditional distribution of responses. "gaussian", "poisson", "binomial", or "Gamma" can be set. If "gaussian" or "Gamma", then equal component dispersion parameters are assumed, except if the optional parameter lambda is modified.
data the data frame (mandatory, even if it is attached to the workspace!).
k the number of mass points/integration points (supported are up to 600 mass points).
random.distribution the mixing distribution, Gaussian Quadrature (gq) or NPML (np) can be set.
tol the tol scalar (usually, 0<tol <= 1)
offset an optional offset to be included in the model.
weights optional prior weights for the data.
pluginz optional numerical vector of length k specifying the starting mass points of the EM algorithm.
na.action a function indicating what should happen when NA's occur, with possible arguments na.omit and na.fail. The default is set by the na.action setting in options().
EMmaxit maximum number of EM iterations.
EMdev.change stops EM algorithm when deviance change falls below this value.
lambda only applicable for Gaussian and Gamma mixtures. If set, standard deviations/ shape parameters are calculated smoothly across components via a Aitchison-Aitken kernel (dkern) with parameter lambda. The setting lambda= 0 is automatically mapped to lambda =1/k and corresponds to the case 'maximal smoothing' (i.e. equal component dispersion parameters), while lambda=1 means 'no smoothing' (unequal disp. param.)
damp switches EM damping on or off.
damp.power steers degree of damping applied on dispersion parameter according to formula 1-(1-tol)^(damp.power*iter+1), see Einbeck & Hinde (2005).
spike.protect protects algorithm to converge into likelihood spikes for Gaussian and Gamma mixtures with unequal or smooth component standard deviations, by stopping the EM algorithm if one of the component standard deviations (shape parameters, resp.), divided by the fitted mass points, falls below (exceeds, resp.) a certain threshold, which is 0.000001*spike.protect (10^6*spike.protect, resp.) Setting spike.protect=0 means disabling the spike protection. If set, then spike.protect=1 is recommended. Note that the displayed disparity may not be correct when convergence is not achieved. This can be checked with EMconverged.
sdev optional; specifies standard deviation for normally distributed response. If unspecified, it will be estimated from the data.
shape optional; specifies shape parameter for gamma-distributed response. Setting shape=1 gives an exponential distribution. If unspecified, it will be estimated from the data.
plot.opt if equal to zero, then no graphical output is given. For plot.opt=1 the development of the disparity -2logL over iteration number is plotted, for plot.opt=2 the EM trajectories are plotted, and for plot.opt=3 both plots are shown.
verbose if set to FALSE, no printed output is given during function execution. Useful for tolfind.
... generic options for the glm function. Not all options may be supported under any circumstances.

Details

The nonparametric maximum likelihood (NPML) approach was introduced in Aitkin (1996) as a tool to fit overdispersed generalized linear models. The idea is to approximate the unknown and unspecified distribution of the random effect by a discrete mixture of exponential family densities, leading to a simple expression of the marginal likelihood which can then be maximized using a standard EM algorithm.

Aitkin (1999) extended this method to generalized linear models with shared random effects arising through variance component or repeated measures structure. Applications are two-stage sample designs, when firstly the primary sampling units (the upper-level units, e.g. classes) and then the secondary sampling units (lower-level units, e.g. students) are selected, or longitudinal data. Models of this type have also been referred to as multi-level models (Goldstein, 2003). allvc is restricted to 2-level models.

The number of components k of the finite mixture has to be specified beforehand. When option 'gq' is set, then Gauss-Hermite masses and mass points are used, assuming implicitly a normally distributed random effect. When option 'np' is chosen, the EM algorithm uses the Gauss-Hermite masses and mass points as starting points. The position of the starting points can be concentrated or extended by setting tol smaller or larger than one, respectively.

Fitting random coefficient models (Aitkin, Francis & Hinde, 2005, pp. 474, p. 491) is possible by specifying the random term explicitly. Note that the setting random= ~ x gives a model with a random slope and a random intercept, that only one random coefficient can be specified, and that the option random.distribution is restricted to np in this case.

The weights have to be understood as frequency weights, i.e. setting all weights in alldist equal to 2 will duplicate each data point and hence double the disparity and deviance.

For k >= 70, mass points with negligible mass (i.e. < 1e-55) are omitted. The maximum number of 'real' mass points is then 240.

Value

The function alldist produces an object of class glmmNPML (if random.distributon is set to np) or glmmGQ (gq). Both objects contain the following 29 components:

coefficients a named vector of coefficients (including the mass points). In case of Gaussian quadrature, the coefficient given at z corresponds to the standard deviation of the mixing distribution.
residuals the difference between the true response and the emprical Bayes predictions.
fitted.values the empirical Bayes predictions (Aitkin, 1996b) on the scale of the responses.
family the `family' object used.
linear.predictors the extended linear predictors eta_ik.
disparity the disparity (-2logL) of the fitted mixture regression model.
deviance the deviance of the fitted mixture regression model.
null.deviance The deviance for the null model (just containing an intercept), comparable with `deviance'.
df.residual the residual degrees of freedom of the fitted model (including the random part).
df.null the residual degrees of freedom for the null model.
y the (extended) response vector.
call the matched call.
formula the formula supplied.
random the random term of the model formula.
data the data argument.
model the (extended) design matrix.
weights the case weights initially supplied.
offset the offset initially supplied.
mass.points the fitted mass points.
masses the mixture probabilities corresponding to the mass points.
sdev a list of the two elements sdev$sdev and sdev$sdevk. The former is the estimated standard deviation of the Gaussian mixture components (estimated over all mixture components), and the latter gives the unequal or smooth component-specific standard deviations. All values are equal if lambda=0.
shape a list of the two elements shape$shape and shape$shapek, to be interpreted in analogy to sdev.
rsdev estimated random effect standard deviation.
post.prob a matrix of posteriori probabilities.
post.int a vector of `posteriori intercepts' (as in Sofroniou et al. (2006)).
ebp the empirical Bayes Predictions on the scale of the linear predictor. For compatibility with older versions.
EMiter gives the number of iterations of the EM algorithm.
EMconverged logical value indicating if the EM algorithm converged.
lastglm the fitted glm object from the last EM iteration.
Misc contains additional information relevant for the summary and plot functions, in particular the disparity trend and the EM trajectories.


If a binomial model is specified by giving a two-column response, the weights returned by weights are the total numbers of cases (factored by the supplied case weights) and the component y of the result is the proportion of successes.
As a by-product, alldist produces a plot showing the disparity in dependence of the iteration number. Further, a plot with the EM trajectories is given. The x-axis corresponds to the iteration number, and the y-axis to the value of the mass points at a particular iteration. This plot is not produced for GQ.

Note

In contrast to the GLIM 4 version, this R implementation uses for Gaussian and Gamma mixtures by default a damping procedure in the first cycles of the EM algorithm (Einbeck & Hinde, 2006), which stabilizes the algorithm and makes it less sensitive to the optimal choice of tol. If tol is very small (i.e. less than 0.1), it can be useful to set damp.power to values larger than 1 in order to accelerate convergence. Do not use damp.power=0, as this would mean permanent damping during EM. Using the option pluginz, one can to some extent circumvent the necessity to specify tol by giving the starting points explicitly. However, when using pluginz for normal or gamma-distributed response, damping will be strictly necessary to ensure that the imposed starting points don't get blurred immediately due to initial fluctuations, implying that tol still plays a role in this case.

Author(s)

Originally translated from the GLIM 4 functions alldist and allvc (Aitkin & Francis, 1995) to R by Ross Darnell (2002). Modified, extended, and prepared for publication by Jochen Einbeck & John Hinde (2006).

References

Aitkin, M. and Francis, B. (1995). Fitting overdispersed generalized linear models by nonparametric maximum likelihood. GLIM Newsletter 25, 37-45.

Aitkin, M. (1996a). A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing 6, 251-262.

Aitkin, M. (1996b). Empirical Bayes shrinkage using posterior random effect means from nonparametric maximum likelihood estimation in general random effect models. Statistical Modelling: Proceedings of the 11th IWSM 1996, 87-94.

Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55, 117-128.

Aitkin, M., Francis, B. and Hinde, J. (2005). Statistical Modelling in GLIM 4. Second Edition, Oxford Statistical Science Series, Oxford, UK.

Einbeck, J. & Hinde, J. (2006). A note on NPML estimation for exponential family regression models with unspecified dispersion parameter. Austrian Journal of Statistics 35, 233-243.

Goldstein, H. (2003). Multilevel Statistical Models (3rd edition). Arnold, London, UK.

Hinde, J. (1982). Compound Poisson regression models. Lecture Notes in Statistics 14, 109-121.

Sofroniou, N., Einbeck, J., and Hinde, J. (2006). Analyzing Irish suicide rates with mixture models. Proceedings of the 21st International Workshop on Statistical Modelling in Galway, Ireland, 2006.

See Also

glm, summary.glmmNPML, predict.glmmNPML family.glmmNPML, plot.glmmNPML.

Examples


# The first three examples (galaxy data, toxoplasmosis data , fabric faults) 
# are based on GLIM examples in Aitkin et al. (2005), and the forth example using
# the Hospital-Stay-Data (Rosner, 2000) is taken from Einbeck & Hinde (2006).
# The fifth data example using the Oxford boys is again inspired by Aitkin et al. (2005).
# The sixth example on Irish suicide rates is taken from Sofroniou et al. (2006).
  

# The galaxy data   
  data(galaxies, package="MASS")
  gal<-as.data.frame(galaxies)
  galaxy.np6 <- alldist(galaxies/1000~1, random=~1, random.distribution="np", 
      data=gal, k=6)
  galaxy.np8u <- alldist(galaxies/1000~1, random=~1, random.distribution="np", 
      data=gal, k=8, lambda=0.99)
  round(galaxy.np8u$sdev$sdevk, digits=3)
  #[1] 0.906 0.435 0.218 0.676 1.205 0.216 0.412 0.295

# The toxoplasmosis data 
  data(rainfall, package="forward")
  rainfall$x<-rainfall$Rain/1000  
  rainfall$x2<- rainfall$x^2; rainfall$x3<- rainfall$x^3
  toxo.np3<- alldist(cbind(Cases,Total-Cases) ~ x+x2+x3, random=~1, 
      random.distribution="np", family=binomial(link=logit), data=rainfall, k=3)
  toxo.np3x<- alldist(cbind(Cases,Total-Cases) ~ x, random=~x, 
      random.distribution="np", family=binomial(link=logit), data=rainfall, k=3)
  #is the same as 
  toxo.np3x<- alldist(Cases/Total ~ x, random = ~x, weights=Total, 
      family=binomial(link=logit), data=rainfall, k=3)
  #or
  toxo.np3x<-update(toxo.np3, .~.-x2-x3, random = ~x)

# The fabric faults data
  data(fabric, package="gamlss")
  coefficients(alldist(y ~ x, random=~1, family=poisson(link=log), 
      random.distribution="gq", data= fabric, k=3, verbose=FALSE))
  #(Intercept)           x           z 
  # -3.3088663   0.8488060   0.3574909
  
# The Pennsylvanian hospital stay data
  data(hosp)
  fitnp3<-  alldist(duration~age+temp1, data=hosp, k=3, family=Gamma(link=log),
      tol=0.5) 
  fitnp3$shape$shape
  #[1] 50.75232
  fitnp3<-  alldist(duration~age+temp1, data=hosp, k=3, family=Gamma(link=log),
      tol=0.5, lambda=0.9) 
  fitnp3$shape$shapek
  #[1]  49.03108  42.79532 126.64046 
  
# The Oxford boys data
  data(Oxboys, package="nlme")
  Oxboys$boy <- gl(26,9)
  allvc(height~age, random=~1|boy, data=Oxboys, random.distribution='gq', k=20)
  allvc(height~age, random=~1|boy, data=Oxboys,random.distribution='np',k=8) 
  #with random coefficients:
  allvc(height~age,random=~age|boy, data=Oxboys, random.distribution='np', k=8)
    
# Irish suicide data
  data(irlsuicide)
  # Crude rate model:
  crude<- allvc(death~sex* age, random=~1|ID, offset=log(pop), 
      k=3, data=irlsuicide, family=poisson) 
  crude$disparity 
  # [1] 654.021
  # Relative risk model:
  relrisk<- allvc(death~1, random=~1|ID, offset=log(expected), 
      k=3, data=irlsuicide, family=poisson) 
  relrisk$disparity    
  # [1] 656.4955
  

[Package npmlreg version 0.43 Index]