nulldensity {bayesclust}R Documentation

Generate Null Distribution of Empirical Posterior Probability

Description

In testing the following the hypothesis,
H_0 : No clusters
H_1 : k clusters

nulldensity generates random variables from the distribution of the Empirical Posterior Probability (EPP) under the null hypothesis.

Usage

nulldensity(nsim, n, k, mcs=0.2, a=2.01, b=0.990099,
        tau2=1, prop=0.25, p, file="")

Arguments

nsim This denotes the number of random variables to generate from the distriution of EPP under the null. It is recommended to be at least 100,000 when the intention is to carry out multiple testing. Otherwise 8,000-10,000 iterations will suffice.
n n is the number of observations in the dataset for testing this hypothesis. See cluster.test for more details.
k k specifies the alternative hypothesis being tested. It must take an integer value strictly greater than 1.
mcs mcs stands for Minimum Cluster Size. It should be a value between 0 and 1. It instructs the test procedure to only consider clusters of a certain minimum size.
a a is a hyperparameter for the prior on σ^2. Further details can be found in the references below.
b Like a, b is also a hyperparameter for the prior on σ^2. Further details can be found in the references below.
tau2 tau2 is a hyperparameter for the prior on the mean μ for each cluster.
prop prop specifies what fraction of the space of partitions under the null hypothesis should be sampled. It is recommended to be at least 0.25.
p The observations are assumed to come from a multivariate normal distribution, of length p.
file This argument is a character string. If specified, the output object will be saved to this (binary) file. It can be loaded, inspected and alterered later in subsequent R sessions using load. If left unspecified, the object will not be saved to a file and could be lost on quitting the R session.

Details

The test statistic (EPP) is computed by the function cluster.test. In order to assess the significance of the statistic, it is necessary to obtain the frequentist p-value of the calculated statistic. This package achieves this task by simulating the null distribution of the test statistic with nulldensity and then extracting the sample quantile using emp2pval.

A very small portion of the code has been written in C. The code becomes slower as k gets larger in the alternative hypothesis.

For a particular dataset, this function can be run in parallel with cluster.test.

Value

The object returned is of class ``nulldensity''. It is a list comprising two components.

param This component, again, exists purely for bookkeeping purposes. When emp2pval is called, it takes two mandatory arguments - one of class ``cluster.test'' and the other of class ``nulldensity''. Both these objects have a parameter component, which should match for the p-value conversion to proceed.
gen.values This is a vector of length nsim, consisting of the simulations from the null distribution.

Author(s)

Fuentes, C. and Gopal, V.

References

Fuentes, C. and Casella, G. (2008) "Testing for the Existence of Clusters" http://www.stat.ufl.edu/~casella/Papers/paper-v3.pdf

Gopal, V. "BayesClust User Manual" http://www.stat.ufl.edu/~viknesh/bayesclust/clust.html

See Also

cluster.test for further information on objects of class ``cluster.test''.

hist.nulldensity which allows the user to plot a histogram of simulated values in order to view the shape of the null distribution.

Examples

# Generate null density object.
null1 <- nulldensity(nsim=100, n=12, p=2, k=2)
hist(null1)

[Package bayesclust version 2.1 Index]