evolMonteCarloClustering {EMCC}R Documentation

evolutionary Monte Carlo clustering algorithm

Description

Given a possibly multi-modal and multi-dimensional clustering target density function and a temperature ladder this function produces samples from the target using the evolutionary Monte Carlo clustering (EMCC) algorithm.

Usage

evolMonteCarloClustering(nIters,              
                         temperLadder,              
                         startingVals,              
                         logTarDensFunc,            
                         MHMergeProb       = 0.5,  
                         moveProbsList     = NULL,  
                         moveNTimesList    = NULL,  
                         levelsSaveSampFor = NULL,  
                         saveFitness       = FALSE, 
                         verboseLevel      = 0,     
                         ...)                       

Arguments

Below sampDim refers to the dimension of the sample space, temperLadderLen refers to the length of the temperature ladder, and levelsSaveSampForLen refers to the length of the levelsSaveSampFor.

nIters integer > 0.
temperLadder double vector with all positive entries, in decreasing order.
startingVals double matrix of dimension temperLadderLen x sampDim or vector of length sampDim, in which case the same starting values are used for every temperature level.
logTarDensFunc function of two arguments (draw, ...) that returns the target density evaluated in the log scale.
MHMergeProb double in (0, 1). See details below for the use of this argument.
moveProbsList named list of probabilities adding upto 1.
moveNTimesList named list of integers >= 0.
levelsSaveSampFor integer vector with positive entries.
saveFitness logical.
verboseLevel integer, a value >= 2 produces a lot of output.
... optional arguments to be passed to logTarDensFunc.

Details

The EMCC algorithm
The evolutionary Monte Carlo clustering (EMCC; Goswami and Liu, 2007) algorithm is composed of the following moves:
MH Metropolis-Hastings or mutation

SCSC_ONE_NEW

sub-cluster swap crossover: one new

SCSC_TWO_NEW

sub-cluster swap crossover: two new

SCRC

sub-cluster reallocation crossover

RE

(random) exchange

The current function could be used to run the EMCC algorithm by specifying what moves to employ using the following variables.

moveProbsList and moveNTimesList
The allowed names for components of moveProbsList and moveNTimesList come from the abbreviated names of the moves above. For example, the following specifications are valid:
moveProbsList = list(MH               = 0.5,
                     'SCSC\_TWO\_NEW' = 0.25,
                     SCRC             = 0.25)

moveNTimesList = list(MH               = 1, 
                      'SCSC\_TWO\_NEW' = floor(temperLadderLen / 2),
                      SCRC             = floor(temperLadderLen / 2),
                      RE               = temperLadderLen)           
MHMergeProb
In the MH or the mutation step, each of the sampDim-many objects are proposed to either merge with an existing cluster or split to form its own cluster with probability MHMergeProb and (1 - MHMergeProb), respectively (see Goswami and Liu, 2007).
levelsSaveSampFor
By default, samples are saved and returned for temperature level temperLadderLen. The levelsSaveSampFor could be used to save samples from other temperature levels as well (e.g., levelsSaveSampFor = 1:temperLadderLen saves samples from all levels).
saveFitness
The term fitness refers to the function H(x), where the target density of interest is given by:

g(x) propto exp[ -H(x) / tau_{min} ]

H(x) is also known as the energy function. By default, the fitness values are not saved, but one can do so by setting saveFitness = TRUE.

Value

This function returns a list with the following components:

draws array of dimension nIters x sampDim x levelsSaveSampForLen, if saveFitness = FALSE. If saveFitness = TRUE, then the returned array is of dimension nIters x (sampDim + 1) x levelsSaveSampForLen; i.e., each of the levelsSaveSampForLen matrices contain the fitness values in their last column.
acceptRatios matrix of the acceptance rates for various moves used.
detailedAcceptRatios list of matrices with detailed summary of the acceptance rates for various moves used.
nIters the nIters argument.
temperLadder the temperLadder argument.
startingVals the startingVals argument.
moveProbsList the moveProbsList argument.
moveNTimesList the moveNTimesList argument.
levelsSaveSampFor the levelsSaveSampFor argument.
time the time taken by the run.

Note

The effect of leaving the default value NULL for some of the arguments above are as follows:

moveProbsList list(MH = 0.5, RC = 0.25, 'SCSC\_TWO\_NEW' = 0.25).

moveNTimesList

list(MH = 1, RC = mm, 'SCSC\_TWO\_NEW' = mm, RE = nn),

where mm <- floor(nn / 2) and nn <- temperLadderLen.

levelsSaveSampFor

temperLadderLen.

Author(s)

Gopi Goswami goswami@stat.harvard.edu

References

Gopi Goswami, Jun S. Liu and Wing H. Wong (2007). Evolutionary Monte Carlo Methods for Clustering. Journal of Computational and Graphical Statistics, 16:4:855-876.

Examples

## Not run: 
## The following example is a simple stochastic optimization problem,
## the set up is same as that of findMaxTemper and placeTempers. Here
## no "heating up" is necessary, and hence the maximum temprature is
## the coldest one, namely, 0.5.
##
## However, we run evolMonteCarloClustering on this example with a
## temperature ladder that is the output of placeTempers, which
## assumes that the maximum temperature is 5.
KMeansObj  <- KMeansFuncGenerator1(-97531)
samplerObj <-
    with(KMeansObj,
     {
         temperLadder    <- c(5.0000000, 1.5593974, 1.1028349, 0.9220684,
                              0.7900778, 0.6496648, 0.5135825, 0.5000000)
         nLevels         <- length(temperLadder)
         sampDim         <- nrow(yy)
         startingVals    <- sample(c(0, 1),
                                   size    = nLevels * sampDim,
                                   replace = TRUE)
         startingVals    <- matrix(startingVals, nrow = nLevels, ncol = sampDim)
         moveProbsList   <- list(MH             = 0.4,
                                 RC             = 0.3,
                                 'SCSC_TWO_NEW' = 0.3)
         mm              <- floor(nLevels / 2)
         moveNTimesList  <- list(MH             = 1,
                                 RC             = mm,
                                 'SCSC_TWO_NEW' = mm,
                                 RE             = nLevels)
         evolMonteCarloClustering(nIters            = 5000,
                                  temperLadder      = temperLadder,
                                  startingVals      = startingVals,
                                  logTarDensFunc    = logTarDensFunc,
                                  moveProbsList     = moveProbsList,
                                  moveNTimesList    = moveNTimesList,
                                  levelsSaveSampFor = seq_len(nLevels),
                                  saveFitness       = TRUE,
                                  verboseLevel      = 1)
     })
print(samplerObj)
print(names(samplerObj))
with(c(samplerObj, KMeansObj),
 {
     print(acceptRatios)
     print(acceptRatiosList)
     print(detailedAcceptRatios)
     print(dim(draws))
     fitnessCol <- ncol(draws[ , , 1])     
     sub        <- paste('uniform prior on # of clusters: DU[',
                         priorMinClusters, ', ',
                         priorMaxClusters, ']', sep = '')
     for (ii in rev(seq_along(levelsSaveSampFor))) {
         main <- paste('EMCC (MAP) clustering (temper = ',
                       round(temperLadder[levelsSaveSampFor[ii]], 3), ')',
                       sep = '')
         MAPRow <- which.min(draws[ , fitnessCol, ii])
         clusterPlot(clusterInd        = draws[MAPRow, -fitnessCol, ii],
                     data              = yy,
                     main              = main,
                     sub               = sub,
                     knownClusterMeans = knownClusterMeans)
     }
 })
## End(Not run)

[Package EMCC version 1.0 Index]