prabclust {prabclus}R Documentation

Clustering of species ranges from presence-absence matrices (mixture method)

Description

Clusters a presence-absence matrix object by calculating an MDS from the distances, and applying maximum likelihood Gaussian mixtures clustering with "noise" (package mclust) to the MDS points. The solution is plotted. A standard execution (using thye default distance of prabinit) will be
prabmatrix <- prabinit(file="path/prabmatrixfile", neighborhood="path/neighborhoodfile")
clust <- prabclust(prabmatrix)
print(clust)
Note: Data formats are described on the prabinit help page. You may also consider the example datasets kykladspecreg.dat and nb.dat. Take care of the parameter rows.are.species of prabinit.
Note: prabclust calls the function mclustBIC in package mclust. Its use is protected by a special license, see http://www.stat.washington.edu/mclust/license.txt, particularly point 6. An alternative is the use of hprabclust.

Usage

prabclust(prabobj, mdsmethod = "classical", mdsdim = 4, nnk =
ceiling(prabobj$n.species/40), nclus = 0:9, modelid = "all", permutations=0)

## S3 method for class 'prabclust':
print(x, bic=FALSE, ...)

Arguments

prabobj object of class prab as generated by prabinit. Presence-absence data to be analyzed.
mdsmethod "classical", "kruskal", or "sammon". The MDS method to transform the distances to data points. "classical" indicates metric MDS by function cmdscale, "kruskal" is non-metric MDS.
mdsdim integer. Dimension of the MDS points.
nnk integer. Number of nearest neighbors to determine the initial noise estimation by NNclean. nnk=0 fits the model without a noise component.
nclus vector of integers. Numbers of clusters to perform the mixture estimation.
modelid string. Model name for mclustBIC (see the corresponding help page; all models or combinations of models mentioned there are possible). modelid="all" compares all possible models. Additionally, "noVVV" is possible, which fits all methods except "VVV".
permutations integer. It has been found occasionally that depending on the order of observations the algorithms isoMDS and mclustBIC converge to different solutions. This is because these methods require an ordering of the distances, which, if equal distance values are involved, may depend on the order. prabclust uses a standard ordering which should give a reproducible solution in these cases as well. However, if permutations>0, which gives a number of random permutations of the observations, the algorithm is carried out for every permutation and the best solution (in terms of the BIC, based on the lowest stress MDS configuration) is given out (for many datasets this won't change anything except increasing the computing time).
x object of class prabclust. Output of prabclust.
bic logical. If TRUE, information about the BIC criterion to choose the model is displayed.
... necessary for summary method.

Value

print.prabclust does not produce output. prabclust generates an object of class prabclust. This is a list with components

clustering vector of integers indicating the cluster memberships of the species. Noise can be recognized by output component symbols.
clustsummary output object of summary.mclustBIC. A list giving the optimal (according to BIC) parameters, conditional probabilities `z', and loglikelihood, together with the associated classification and its uncertainty. Note that the numbering of clusters may differ from clustering, see csreorder.
bicsummary output object of mclustBIC. Bayesian Information Criterion for the specified mixture models and numbers of clusters.
points numerical matrix. MDS configuration.
nnk see above.
mdsdim see above.
mdsmethod see above.
symbols vector of characters, similar to clustering, but indicating estimated noise and points belonging to one-point-components (which should be interpreted as some kind of noise as well) by "N".
permchange logical. If TRUE, permutations>0 has been used and the best solution is different from the one obtained by the standard ordering. (This is just for information and has no further operational consequences.)
csreorder integer vector. This gives the numbering of the components in clustsummary relative to clustering. Usually, clustering and symbols will be used, but in order to use the information in clustsummary (parameter values, posterior assignment probabilities etc.), it has to be taken into account that cluster no. 1 in clustering corresponds to cluster no. csreorder[1] in clustsummary and so on. Noise, if present, is numbered 0 in clustering as well as clustsummary.

Note

Note that we used mdsmethod="kruskal" in our publications, but mdsmethod="classical" is now the default, because of occasional numerical instabilities of the isoMDS-implementation for Jaccard, Kulczynski or geco distance matrices.

Sometimes, prabclust produces an error because mclustBIC cannot handle all models properly. In this case we recommend to change the modelid parameter. "noVVV" and "VVV" are reasonable alternative choices (one of these is expected to reproduce the error, but the other one might work).

Author(s)

Christian Hennig chrish@stats.ucl.ac.uk http://www.homepages.ucl.ac.uk/~ucakche

References

Fraley, C. and Raftery, A. E. (1998) How many clusters? Which clusterin method? - Answers via Model-Based Cluster Analysis. Computer Journal 41, 578-588.

Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.

See Also

mclustBIC, summary.mclustBIC, NNclean, cmdscale, isoMDS, sammon, prabinit, hprabclust.

Examples

data(kykladspecreg)
data(nb)
set.seed(1234)
x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb)
# If you want to use your own ASCII data files, use
# x <- prabinit(file="path/prabmatrixfile",
# neighborhood="path/neighborhoodfile")
print(prabclust(x))

[Package prabclus version 2.1-2 Index]