normalmixEM {mixtools}R Documentation

EM Algorithm for Mixtures of Univariate Normals

Description

Return EM algorithm output for mixtures of normal distributions.

Usage

normalmixEM(x, lambda = NULL, mu = NULL, sigma = NULL, k = 2,
            arbmean = TRUE, arbvar = TRUE, epsilon = 1e-08, 
            maxit = 1000, maxrestarts=20, verb = FALSE, fast=FALSE)

Arguments

x A vector of length n consisting of the data.
lambda Initial value of mixing proportions. Automatically repeated as necessary to produce a vector of length k, then normalized to sum to 1. If NULL, then lambda is random from a uniform Dirichlet distribution (i.e., its entries are uniform random and then it is normalized to sum to 1).
mu Starting value of vector of component means. If non-NULL and a scalar, arbmean is set to FALSE. If non-NULL and a vector, k is set to length(mu). If NULL, then the initial value is randomly generated from a normal distribution with center(s) determined by binning the data.
sigma Starting value of vector of component standard deviations for algorithm. If non-NULL and a scalar, arbvar is set to FALSE. If non-NULL and a vector, arbvar is set to TRUE and k is set to length(sigma) If NULL, then the initial value is the reciprocal of the square root of a vector of random exponential-distribution values whose means are determined according to a binning method done on the data.
k Number of components. Initial value ignored unless mu and sigma are both NULL.
arbmean If TRUE, then the component densities are allowed to have different mus. If FALSE, then a scale mixture will be fit. Initial value ignored unless mu is NULL.
arbvar If TRUE, then the component densities are allowed to have different sigmas. If FALSE, then a location mixture will be fit. Initial value ignored unless sigma is NULL.
epsilon The convergence criterion. Convergence is declared when the change in the observed data log-likelihood increases by less than epsilon.
maxit The maximum number of iterations.
maxrestarts The maximum number of restarts allowed in case of a problem with the particular starting values chosen (each restart uses randomly chosen starting values).
verb If TRUE, then various updates are printed during each iteration of the algorithm.
fast If TRUE and k==2 and arbmean==TRUE, then use normalmixEM2comp, which is a much faster version of the EM algorithm for this case. This version is less protected against certain kinds of underflow that can cause numerical problems and it does not permit any restarts. If k>2, fast is ignored.

Value

normalmixEM returns a list of class mixEM with items:

x The raw data.
lambda The final mixing proportions.
mu The final mean parameters.
sigma The final standard deviations. If arbmean = FALSE, then only the smallest standard deviation is returned. See scale below.
scale If arbmean = FALSE, then the scale factor for the component standard deviations is returned. Otherwise, this is omitted from the output.
loglik The final log-likelihood.
posterior An nxk matrix of posterior probabilities for observations.
all.loglik A vector of each iteration's log-likelihood. This vector includes both the initial and the final values; thus, the number of iterations is one less than its length.
restarts The number of times the algorithm restarted due to unacceptable choice of initial values.
ft A character vector giving the name of the function.

References

McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley & Sons, Inc.

See Also

mvnormalmixEM, normalmixEM2comp

Examples

##Analyzing the Old Faithful geyser data with a 2-component mixture of normals.

data(faithful)
attach(faithful)
system.time(out<-normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03))
out
system.time(out2<-normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03, fast=TRUE))
out2 # same thing but much faster

[Package mixtools version 0.3.3 Index]