evolMonteCarloClustering {EMCC} | R Documentation |
Given a possibly multi-modal and multi-dimensional clustering target density function and a temperature ladder this function produces samples from the target using the evolutionary Monte Carlo clustering (EMCC) algorithm.
evolMonteCarloClustering(nIters, temperLadder, startingVals, logTarDensFunc, MHMergeProb = 0.5, moveProbsList = NULL, moveNTimesList = NULL, levelsSaveSampFor = NULL, saveFitness = FALSE, verboseLevel = 0, ...)
Below sampDim
refers to the dimension of the sample space,
temperLadderLen
refers to the length of the temperature ladder,
and levelsSaveSampForLen
refers to the length of the
levelsSaveSampFor
.
nIters |
integer > 0. |
temperLadder |
double vector with all positive
entries, in decreasing order. |
startingVals |
double matrix of dimension
temperLadderLen x sampDim or vector of
length sampDim , in which case the same starting values are
used for every temperature level. |
logTarDensFunc |
function of two arguments
(draw, ...)
that returns the target density evaluated in the log scale. |
MHMergeProb |
double in (0, 1). See details below
for the use of this argument. |
moveProbsList |
named list of probabilities adding upto 1. |
moveNTimesList |
named list of integers >= 0. |
levelsSaveSampFor |
integer vector with positive
entries. |
saveFitness |
logical . |
verboseLevel |
integer , a value >= 2 produces a
lot of output. |
... |
optional arguments to be passed to
logTarDensFunc . |
MH | Metropolis-Hastings or mutation |
SCSC_ONE_NEW | sub-cluster swap crossover: one new |
SCSC_TWO_NEW | sub-cluster swap crossover: two new |
SCRC | sub-cluster reallocation crossover |
RE | (random) exchange |
The current function could be used to run the EMCC algorithm by specifying what moves to employ using the following variables.
moveProbsList
and moveNTimesList
moveProbsList
and
moveNTimesList
come from the abbreviated names of the
moves above. For example, the following specifications are
valid:moveProbsList = list(MH = 0.5, 'SCSC\_TWO\_NEW' = 0.25, SCRC = 0.25)
moveNTimesList = list(MH = 1, 'SCSC\_TWO\_NEW' = floor(temperLadderLen / 2), SCRC = floor(temperLadderLen / 2), RE = temperLadderLen)
MHMergeProb
sampDim
-many objects are proposed to either
merge with an existing cluster or split to form its own cluster
with probability MHMergeProb
and (1 - MHMergeProb
),
respectively (see Goswami and Liu, 2007).levelsSaveSampFor
temperLadderLen
. The
levelsSaveSampFor
could be used to save samples from other
temperature levels as well (e.g., levelsSaveSampFor =
1:temperLadderLen
saves samples from all levels).saveFitness
g(x) propto exp[ -H(x) / tau_{min} ]
H(x) is also known as the energy function. By default,
the fitness values are not saved, but one can do so by setting
saveFitness = TRUE
.
This function returns a list with the following components:
draws |
array of dimension nIters x
sampDim x levelsSaveSampForLen , if
saveFitness = FALSE . If saveFitness = TRUE , then the
returned array is of dimension nIters x
(sampDim + 1) x levelsSaveSampForLen ;
i.e., each of the levelsSaveSampForLen matrices contain the
fitness values in their last column. |
acceptRatios |
matrix of the acceptance rates for various
moves used. |
detailedAcceptRatios |
list of matrices with detailed
summary of the acceptance rates for various moves used. |
nIters |
the nIters argument. |
temperLadder |
the temperLadder argument. |
startingVals |
the startingVals argument. |
moveProbsList |
the moveProbsList argument. |
moveNTimesList |
the moveNTimesList argument. |
levelsSaveSampFor |
the levelsSaveSampFor argument. |
time |
the time taken by the run. |
The effect of leaving the default value NULL
for some of the
arguments above are as follows:
moveProbsList | list(MH = 0.5, RC = 0.25, 'SCSC\_TWO\_NEW' = 0.25) . |
| list(MH = 1, RC = mm, 'SCSC\_TWO\_NEW' = mm, RE = nn) , |
| where mm <- floor(nn / 2) and nn <- temperLadderLen . |
| temperLadderLen .
|
Gopi Goswami goswami@stat.harvard.edu
Gopi Goswami, Jun S. Liu and Wing H. Wong (2007). Evolutionary Monte Carlo Methods for Clustering. Journal of Computational and Graphical Statistics, 16:4:855-876.
## Not run: ## The following example is a simple stochastic optimization problem, ## the set up is same as that of findMaxTemper and placeTempers. Here ## no "heating up" is necessary, and hence the maximum temprature is ## the coldest one, namely, 0.5. ## ## However, we run evolMonteCarloClustering on this example with a ## temperature ladder that is the output of placeTempers, which ## assumes that the maximum temperature is 5. KMeansObj <- KMeansFuncGenerator1(-97531) samplerObj <- with(KMeansObj, { temperLadder <- c(5.0000000, 1.5593974, 1.1028349, 0.9220684, 0.7900778, 0.6496648, 0.5135825, 0.5000000) nLevels <- length(temperLadder) sampDim <- nrow(yy) startingVals <- sample(c(0, 1), size = nLevels * sampDim, replace = TRUE) startingVals <- matrix(startingVals, nrow = nLevels, ncol = sampDim) moveProbsList <- list(MH = 0.4, RC = 0.3, 'SCSC_TWO_NEW' = 0.3) mm <- floor(nLevels / 2) moveNTimesList <- list(MH = 1, RC = mm, 'SCSC_TWO_NEW' = mm, RE = nLevels) evolMonteCarloClustering(nIters = 5000, temperLadder = temperLadder, startingVals = startingVals, logTarDensFunc = logTarDensFunc, moveProbsList = moveProbsList, moveNTimesList = moveNTimesList, levelsSaveSampFor = seq_len(nLevels), saveFitness = TRUE, verboseLevel = 1) }) print(samplerObj) print(names(samplerObj)) with(c(samplerObj, KMeansObj), { print(acceptRatios) print(acceptRatiosList) print(detailedAcceptRatios) print(dim(draws)) fitnessCol <- ncol(draws[ , , 1]) sub <- paste('uniform prior on # of clusters: DU[', priorMinClusters, ', ', priorMaxClusters, ']', sep = '') for (ii in rev(seq_along(levelsSaveSampFor))) { main <- paste('EMCC (MAP) clustering (temper = ', round(temperLadder[levelsSaveSampFor[ii]], 3), ')', sep = '') MAPRow <- which.min(draws[ , fitnessCol, ii]) clusterPlot(clusterInd = draws[MAPRow, -fitnessCol, ii], data = yy, main = main, sub = sub, knownClusterMeans = knownClusterMeans) } }) ## End(Not run)