prabclust {prabclus} | R Documentation |
Clusters a presence-absence matrix object by calculating an MDS from
the distances, and applying maximum likelihood Gaussian mixtures clustering
with "noise" (package mclust
) to the MDS points. The solution
is plotted. A standard execution (using thye default distance of
prabinit
) will be
prabmatrix <- prabinit(file="path/prabmatrixfile",
neighborhood="path/neighborhoodfile")
clust <- prabclust(prabmatrix)
print(clust)
Note: Data formats are described
on the prabinit
help page. You may also consider the example datasets
kykladspecreg.dat
and nb.dat
. Take care of the
parameter rows.are.species
of prabinit
.
Note: prabclust
calls the function
mclustBIC
in package mclust. Its use is
protected by a special license, see
http://www.stat.washington.edu/mclust/license.txt, particularly
point 6. An alternative is the use of hprabclust
.
prabclust(prabobj, mdsmethod = "classical", mdsdim = 4, nnk = ceiling(prabobj$n.species/40), nclus = 0:9, modelid = "all", permutations=0) ## S3 method for class 'prabclust': print(x, bic=FALSE, ...)
prabobj |
object of class prab as
generated by prabinit . Presence-absence data to be analyzed.
|
mdsmethod |
"classical" , "kruskal" , or
"sammon" . The MDS method
to transform the distances to data points. "classical" indicates
metric MDS by function cmdscale , "kruskal" is
non-metric MDS. |
mdsdim |
integer. Dimension of the MDS points. |
nnk |
integer. Number of nearest neighbors to determine the
initial noise estimation by NNclean . nnk=0 fits the
model without a noise component. |
nclus |
vector of integers. Numbers of clusters to perform the mixture estimation. |
modelid |
string. Model name for mclustBIC (see the
corresponding help page; all models or combinations of models
mentioned there are possible). modelid="all" compares all possible
models. Additionally, "noVVV" is possible, which
fits all methods except "VVV" . |
permutations |
integer. It has been found occasionally that
depending on the order of observations the algorithms isoMDS
and mclustBIC converge to different solutions. This is
because these methods require an ordering of the distances, which,
if equal distance values are involved, may depend on the order.
prabclust uses a standard ordering which should give a
reproducible solution in these cases as well. However, if
permutations>0 , which gives a number of random permutations
of the observations, the algorithm is carried out for every
permutation and the best solution (in terms of the BIC, based on the
lowest stress MDS configuration) is given out (for many datasets
this won't change anything except increasing the computing time). |
x |
object of class prabclust . Output of
prabclust . |
bic |
logical. If TRUE , information about the BIC
criterion to choose the model is displayed. |
... |
necessary for summary method. |
print.prabclust
does not produce output.
prabclust
generates an object of class prabclust
. This is a
list with components
clustering |
vector of integers indicating the cluster memberships of
the species. Noise can be recognized by output component symbols . |
clustsummary |
output object of summary.mclustBIC . A list
giving the optimal (according to BIC) parameters,
conditional probabilities `z', and loglikelihood, together with
the associated classification and its uncertainty. Note that the
numbering of clusters may differ from clustering , see
csreorder . |
bicsummary |
output object of mclustBIC . Bayesian Information
Criterion for the specified mixture models and numbers of clusters. |
points |
numerical matrix. MDS configuration. |
nnk |
see above. |
mdsdim |
see above. |
mdsmethod |
see above. |
symbols |
vector of characters, similar to clustering , but
indicating estimated noise and points belonging to
one-point-components (which should be interpreted as some kind of
noise as well) by "N" . |
permchange |
logical. If TRUE , permutations>0 has
been used and the best solution is different from the one obtained
by the standard ordering. (This is just for information and has no
further operational consequences.) |
csreorder |
integer vector. This gives the numbering of the
components in clustsummary relative to
clustering . Usually, clustering and symbols
will be used, but in order to use the information in
clustsummary (parameter values, posterior assignment
probabilities etc.), it has to be taken into account that cluster
no. 1 in clustering corresponds to cluster
no. csreorder[1] in clustsummary and so on. Noise, if
present, is numbered 0 in clustering as well as
clustsummary . |
Note that we used mdsmethod="kruskal"
in our publications, but
mdsmethod="classical"
is now the default, because of
occasional numerical instabilities of the isoMDS
-implementation
for Jaccard, Kulczynski or geco distance matrices.
Sometimes, prabclust
produces an error because mclustBIC
cannot handle all models properly. In this case we recommend to change
the modelid
parameter. "noVVV"
and "VVV"
are
reasonable alternative choices (one of these is expected to reproduce
the error, but the other one might work).
Christian Hennig chrish@stats.ucl.ac.uk http://www.homepages.ucl.ac.uk/~ucakche
Fraley, C. and Raftery, A. E. (1998) How many clusters? Which clusterin method? - Answers via Model-Based Cluster Analysis. Computer Journal 41, 578-588.
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.
mclustBIC
, summary.mclustBIC
,
NNclean
, cmdscale
,
isoMDS
, sammon
,
prabinit
, hprabclust
.
data(kykladspecreg) data(nb) set.seed(1234) x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb) # If you want to use your own ASCII data files, use # x <- prabinit(file="path/prabmatrixfile", # neighborhood="path/neighborhoodfile") print(prabclust(x))