bayesclust-package {bayesclust}R Documentation

Testing and Searching for Clusters in A Hierarchical Bayes Model

Description

This package contains a suite of functions that allow the user to carry out the following hypothesis test on genetic data:
H_0 : No clusters
H_1 : 2, 3 or 4 clusters

Details

The hypothesis test is formulated as a model selection problem, where the aim is to identify the model with the highest posterior probability. A Hierarchical Bayes model is assumed for the data. Note that firstly, the null hypothesis is equivalent to saying that the population consists of just one cluster. Secondly, since the functions here only allow the alternative hypothesis to be either 2, 3 or 4 at any one time, the package allows the user to test multiple hypotheses while controlling the False Discovery Rate (FDR).

This is a brief of summary of the test procedure:

  1. For a given dataset, compute the empirical posterior probability (EPP) of the null hypothesis using cluster.test. EPP will serve as the test statistic in this hypothesis test.
  2. Monitor the convergence of EPP by running plot on the object returned in Step 1.
  3. Generate the distribution of EPP under the null hypothesis using nulldensity. This can be done concurrent to Steps 1 and 2. Be sure to use the same parameters for Steps 1 and 3 though.
  4. Estimate the p-value of the EPP for this dataset using emp2pval. This function takes the objects returned in Steps 1 and 3 as input.
  5. If multiple hypotheses are being tested, check to see which are significant, whilst controlling for FDR, by applying fdr.test to the objects returned in Step 4.
  6. Run cluster.optimal on significant datasets to pick out optimal clusters.
  7. Run plot on the object returned in Step 6 to view the optimal clustering/partition of the data.

For full details on the distributional assumptions, please refer to the papers listed in the references section. For further details on the individual functions, please refer to their respective help pages and the examples.

Author(s)

George Casella casella@stat.ufl.edu and Claudio Fuentes cfuentes@stat.ufl.edu and Vik Gopal viknesh@stat.ufl.edu

Maintainer: Vik Gopal <viknesh@stat.ufl.edu>

References

Fuentes, C. and Casella, G. (2008) "Testing for the Existence of Clusters" http://www.stat.ufl.edu/~casella/Papers/paper-v3.pdf

Gopal, V. "BayesClust User Manual" http://www.stat.ufl.edu/~viknesh/bayesclust/clust.html

See Also

cluster.test, cluster.optimal, emp2pval, nulldensity, fdr.test


[Package bayesclust version 2.1 Index]