regmix {fpc}R Documentation

Mixture Model ML for Clusterwise Linear Regression

Description

Computes an ML-estimator for clusterwise linear regression under a regression mixture model with Normal errors. Parameters are proportions, regression coefficients and error variances, all independent of the values of the independent variable, and all may differ for different clusters. Computation is by the EM-algorithm. The number of clusters is estimated via the Bayesian Information Criterion (BIC).

Usage

regmix(indep, dep, ir=1, nclust=1:7, icrit=1.e-5, minsig=1.e-6, warnings=FALSE)

regem(indep, dep, m, cln, icrit=1.e-5, minsig=1.e-6, warnings=FALSE) 

Arguments

indep numerical matrix or vector. Independent variables.
dep numerical vector. Dependent variable.
ir positive integer. Number of iteration runs for every number of clusters.
nclust vector of positive integers. Numbers of clusters.
icrit positive numerical. Stopping criterion for the iterations (difference of loglikelihoods).
minsig positive numerical. Minimum value for the variance parameters (likelihood is unbounded if variances are allowed to converge to 0).
warnings logical. If TRUE, warnings are given during the EM iteration in case of collinear regressors, too small mixture components and error variances smaller than minimum. In the former two cases, the algorithm is terminated without a result, but an optimal solution is still computed from other algorithm runs (if there are others). In the latter case, the corresponding variance is set to the minimum.
cln positive integer. (Single) number of clusters.
m matrix of positive numericals. Number of columns must be cln. Number of rows must be number of data points. Columns must add up to 1. Initial configuration for the EM iteration in terms of a probabilty vector for every point which gives its degree of membership to every cluster. As generated by randcmatrix.

Details

The result of the EM iteration depends on the initial configuration, which is generated randomly by randcmatrix for regmix. regmix calls regem. To provide the initial configuration manually, use parameter m of regem directly. Take a look at the example about how to generate m if you want to specify initial parameters.

The original paper DeSarbo and Cron (1988) suggests the AIC for estimating the number of clusters. The use of the BIC is advocated by Wedel and DeSarbo (1995). The BIC is defined here as 2*loglik - log(n)*((p+3)*cln-1), p being the number of independent variables, i.e., the larger the better.

See the entry for the input parameter warnings for the treatment of several numerical problems.

Value

regmix returns a list containing the components clnopt, loglik, bic, coef, var, eps, z, g.
regem returns a list containing the components loglik, coef, var, z, g, warn.

clnopt optimal number of clusters according to the BIC.
loglik loglikelihood for the optimal model.
bic vector of BIC values for all numbers of clusters in nclust.
coef matrix of regression coefficients. First row: intercept parameter. Second row: parameter of first independent variable and so on. Columns corresponding to clusters.
var vector of error variance estimators for the clusters.
eps vector of cluster proportion estimators.
z matrix of estimated a posteriori probabilities of the points (rows) to be generated by the clusters (columns). Compare input argument m.
g integer vector of estimated cluster numbers for the points (via argmax over z).
warn logical. TRUE if one of the estimated clusters has too few points and/or collinear regressors.

Author(s)

Christian Hennig chrish@stats.ucl.ac.uk http://www.homepages.ucl.ac.uk/~ucakche/

References

DeSarbo, W. S. and Cron, W. L. (1988) A maximum likelihood methodology for clusterwise linear regression, Journal of Classification 5, 249-282.

Wedel, M. and DeSarbo, W. S. (1995) A mixture likelihood approach for generalized linear models, Journal of Classification 12, 21-56.

See Also

fixreg for fixed point clusters for clusterwise linear regression.

EMclust for Normal mixture model fitting (non-regression).

Examples

set.seed(12234)
data(tonedata)
# Note: If you do not use the installed package, replace this by
# tonedata <- read.table("(path/)tonedata.txt", header=TRUE)
attach(tonedata)
rmt1 <- regmix(stretchratio,tuned,nclust=1:2)
# nclust=1:2 makes the example fast;
# a more serious application would rather use the default.
rmt1$g
rmt1$bic
# start with initial parameter values
cln <- 3
n <- 150
initcoef <- cbind(c(2,0),c(0,1),c(0,2.5))
initvar <- c(0.001,0.0001,0.5)
initeps <- c(0.4,0.3,0.3)
# computation of m from initial parameters
m <- matrix(nrow=n, ncol=cln)
stm <- numeric(0)
for (i in 1:cln)
  for (j in 1:n){
    m[j,i] <- initeps[i]*dnorm(tuned[j],mean=initcoef[1,i]+
              initcoef[2,i]*stretchratio[j], sd=sqrt(initvar[i]))
  }
  for (j in 1:n){
    stm[j] <- sum(m[j,])
    for (i in 1:cln)
      m[j,i] <- m[j,i]/stm[j]
  } 
rmt2 <- regem(stretchratio, tuned, m, cln)
rmt2bic <- 2*rmt2$loglik - log(150)*(4*cln-1)
rmt2bic

[Package fpc version 1.2-3 Index]