findMaxTemper {EMCC} | R Documentation |
The evolutionary Monte Carlo clustering (EMCC) algorithm needs a temperature ladder. This function finds the maximum temperature for constructing the ladder.
findMaxTemper(nIters, statsFuncList, startingVals, logTarDensFunc, temperLadder = NULL, temperLimits = NULL, ladderLen = 10, scheme = 'exponential', schemeParam = 0.5, cutoffDStats = 1.96, cutoffESS = 50, guideMe = TRUE, levelsSaveSampFor = NULL, saveFitness = FALSE, doFullAnal = TRUE, verboseLevel = 0, ...)
Below sampDim
refers to the dimension of the sample space,
temperLadderLen
refers to the length of the temperature ladder,
and levelsSaveSampForLen
refers to the length of
levelsSaveSampFor
. Note, this function calls
evolMonteCarloClustering
, so some of the arguments below
have the same name and meaning as the corresponding ones for
evolMonteCarloClustering
. See details below for
explanation on the following arguments.
nIters |
integer > 0. |
statsFuncList |
list of functions of one argument
each, which return the value of the statistic evaluated at
one MCMC sample or draw. |
startingVals |
double matrix of dimension
temperLadderLen x sampDim or vector of
length sampDim , in which case the same starting values are
used for every temperature level. |
logTarDensFunc |
function of two arguments
(draw, ...) that returns the target density evaluated in
the log scale. |
temperLadder |
double vector with all positive
entries, in decreasing order. |
temperLimits |
double vector with two positive
entries. |
ladderLen |
integer > 0. |
scheme |
character . |
schemeParam |
double > 0. |
cutoffDStats |
double > 0. |
cutoffESS |
double > 0. |
guideMe |
logical . |
levelsSaveSampFor |
integer vector with positive
entries. |
saveFitness |
logical . |
doFullAnal |
logical . |
verboseLevel |
integer , a value >= 2 produces a
lot of output. |
... |
optional arguments to be passed to logTarDensFunc ,
MHPropNewFunc and logMHPropDensFunc . |
This function is based on the method to find the temperature range introduced in section 4.1 of Goswami and Liu (2007).
statsFuncList
coord1 <- function (xx) { xx[1] }
coord3 <- function (xx) { xx[3] }
statsFuncList <- list(coord1, coord3)
temperLadder
temperLadder
or specify
temperLimits
, ladderLen
, scheme
and
schemeParam
. For details on the later set of parameters,
see below. Note, temperLadder
overrides
temperLimits
, ladderLen
, scheme
and
schemeParam
.temperLimits
temperLimits = c(lowerLimit,
upperLimit)
is a two-tuple of positive numbers, where the
lowerLimit
is usually 1 and upperLimit
is a number
in [100, 1000]. If stochastic optimization (via sampling) is the
goal, then lowerLimit
is taken to be in [0, 1].ladderLen
, scheme
and schemeParam
temperLimits
) if
temperLadder
is not provided. We recommend taking
ladderLen
in [15, 30]. The allowed choices for
scheme
and schemeParam
are:scheme | schemeParam |
======== | ============= |
linear | NA |
log | NA |
geometric | NA |
mult-power | NA |
add-power | >= 0 |
reciprocal | NA |
exponential | >= 0 |
tangent | >= 0 |
We recommended using scheme = 'exponential'
and
schemeParam
in [0.3, 0.5].
cutoffDStats
statsFuncList
, which is usually the
case, using this cutoff may result in different suggested maximum
temperatures (as can be seen by calling the print
function
on the result of findMaxTemper
). A conservative
recommendation is that you choose the maximum of the suggested
temperatures as the final maximum temperature for use in
placeTempers
and later in parallelTempering
or
evolMonteCarlo
.cutoffESS
guideMe
guideMe = TRUE
, then the function
suggests different modifications to alter the setting towards a
re-run, in case there are problems with the underlying MCMC run.doFullAnal
doFullAnal = TRUE
, then the
search for the maximum temperature is conducted among all
the levels of the temperLadder
. In case this switch is
turned off, the search for maximum temperature is done in a greedy
(and faster) manner, namely, search is stopped as soon as all the
statistic(s) in the statsFuncList
find some maximum
temperature(s). Note, the greedy search may result in much higher
maximum temperature (and hence sub-optimal) than needed, so it is
not recommended.levelsSaveSampFor
evolMonteCarlo
for the underlying MCMC run.
This function returns a list with the following components:
temperLadder |
the temperature ladder used for the underlying MCMC run. |
DStats |
the D-statistic (Goswami and Liu, 2007) values used to find the maximum temperature. |
cutoffDStats |
the cutoffDStats argument. |
nIters |
the post burn-in nIters . |
levelsSaveSampFor |
the levelsSaveSampFor argument. |
draws |
array of dimension nIters x
sampDim x levelsSaveSampForLen , if
saveFitness = FALSE . If saveFitness = TRUE , then the
returned array is of dimension nIters x
(sampDim + 1) x levelsSaveSampForLen ;
i.e., each of the levelsSaveSampForLen matrices contain the
fitness values in their last column. |
startingVals |
the startingVals argument. |
intermediate statistics |
a bunch of intermediate statistics used
in the computation of DStats , namely, MCEsts ,
MCVarEsts , MCESS , ISEsts , ISVarEsts ,
ISESS , each being computed for all the statistics provided by
statsFuncList argument. |
time |
the time taken by the run. |
The effect of leaving the default value NULL
for some of the
arguments above are as follows:
temperLadder | valid temperLimits , ladderLen , scheme and schemeParam |
| are provided, which are used to construct the temperLadder . |
| a valid temperLadder is provided. |
| temperLadderLen .
|
Gopi Goswami goswami@stat.harvard.edu
Gopi Goswami and Jun S. Liu (2007). On learning strategies for evolutionary Monte Carlo. Statistics and Computing 17:1:23-38.
Gopi Goswami, Jun S. Liu and Wing H. Wong (2007). Evolutionary Monte Carlo Methods for Clustering. Journal of Computational and Graphical Statistics, 16:4:855-876.
placeTempers
, evolMonteCarloClustering
## Not run: ## The following example is a simple stochastic optimization problem, ## and thus it does not require any "heating up", and hence the ## maximum temperature turns out to be the coldest one, i.e, 0.5. adjMatSum <- function (xx) { xx <- as.integer(xx) adjMat <- outer(xx, xx, function (id1, id2) { id1 == id2 }) sum(adjMat) } modeSensitive1 <- function (xx) { with(partitionRep(xx), { rr <- 1 + seq_along(clusterLabels) freq <- sapply(clusters, length) oo <- order(freq, decreasing = TRUE) sum(sapply(clusters[oo], sum) * log(rr)) }) } entropy <- function (xx) { yy <- table(as.vector(xx, mode = "numeric")) zz <- yy / length(xx) -sum(zz * log(zz)) } maxProp <- function (xx) { yy <- table(as.vector(xx, mode = "numeric")) oo <- order(yy, decreasing = TRUE) yy[oo][1] / length(xx) } statsFuncList <- list(adjMatSum, modeSensitive1, entropy, maxProp) KMeansObj <- KMeansFuncGenerator1(-97531) maxTemperObj <- with(KMeansObj, { temperLadder <- c(20, 10, 5, 1, 0.5) nLevels <- length(temperLadder) sampDim <- nrow(yy) startingVals <- sample(c(0, 1), size = nLevels * sampDim, replace = TRUE) startingVals <- matrix(startingVals, nrow = nLevels, ncol = sampDim) findMaxTemper(nIters = 5000, statsFuncList = statsFuncList, temperLadder = temperLadder, startingVals = startingVals, logTarDensFunc = logTarDensFunc, levelsSaveSampFor = seq_len(nLevels), doFullAnal = TRUE, saveFitness = TRUE, verboseLevel = 1) }) print(maxTemperObj) print(names(maxTemperObj)) with(c(maxTemperObj, KMeansObj), { fitnessCol <- ncol(draws[ , , 1]) sub <- paste('uniform prior on # of clusters: DU[', priorMinClusters, ', ', priorMaxClusters, ']', sep = '') for (ii in rev(seq_along(levelsSaveSampFor))) { main <- paste('EMCC (MAP) clustering (temper = ', round(temperLadder[levelsSaveSampFor[ii]], 3), ')', sep = '') MAPRow <- which.min(draws[ , fitnessCol, ii]) clusterPlot(clusterInd = draws[MAPRow, -fitnessCol, ii], data = yy, main = main, sub = sub, knownClusterMeans = knownClusterMeans) } }) ## End(Not run)