entropy.Dirichlet {entropy}R Documentation

Family of Dirichlet Entropy and Mutual Information Estimators

Description

entropy.Dirichlet estimates the Shannon entropy H of the random variable Y from the corresponding observed counts y by plug-in of Bayesian estimates of the bin frequencies using the Dirichlet-multinomial pseudocount model.

mi.Dirichlet estimates the corresponding mutual information of two random variables.

freqs.Dirichlet computes the Bayesian estimates of the bin frequencies using the Dirichlet-multinomial pseudocount model.

Usage

entropy.Dirichlet(y, a, unit=c("log", "log2", "log10"))
mi.Dirichlet(y, a, unit=c("log", "log2", "log10"))
freqs.Dirichlet(y, a)

Arguments

y vector or matrix of counts.
a pseudocount per bin.
unit the unit in which entropy is measured.

Details

The Dirichlet-multinomial pseudocount entropy estimator is a Bayesian plug-in estimator: in the definition of the Shannon entropy the bin probabilities are replaced by the respective Bayesian estimates of the frequencies, using a model with a Dirichlet prior and a multinomial likelihood.

The parameter a is a parameter of the Dirichlet prior, and in effect specifies the pseudocount per bin. Popular choices of a are:

The pseudocount a can also be a vector so that for each bin an individual pseudocount is added.

Value

entropy.Dirichlet returns an estimate of the Shannon entropy.
mi.Dirichlet returns an estimate of the mutual information.
freqs.Dirichlet returns the underlying frequencies.

Author(s)

Korbinian Strimmer (http://strimmerlab.org).

References

Agresti, A., and D. B. Hitchcock. 2005. Bayesian inference for categorical data analysis. Stat. Methods. Appl. 14:297–330.

Krichevsky, R. E., and V. K. Trofimov. 1981. The performance of universal encoding. IEEE Trans. Inf. Theory 27: 199-207.

Schurmann, T., and P. Grassberger. 1996. Entropy estimation of symbol sequences. Chaos 6:41-427.

See Also

entropy, entropy.shrink, entropy.NSB, entropy.ChaoShen, entropy.empirical, entropy.plugin, mi.plugin.

Examples

# load entropy library 
library("entropy")

# observed counts for each bin
y = c(4, 2, 3, 0, 2, 4, 0, 0, 2, 1, 1)  

# Dirichlet estimate with a=0
entropy.Dirichlet(y, a=0)

# compare to empirical estimate
entropy.empirical(y)

# Dirichlet estimate with a=1/2
entropy.Dirichlet(y, a=1/2)

# Dirichlet estimate with a=1
entropy.Dirichlet(y, a=1)

# Dirichlet estimate with a=1/length(y)
entropy.Dirichlet(y, a=1/length(y))

# Dirichlet estimate with a=sqrt(sum(y))/length(y)
entropy.Dirichlet(y, a=sqrt(sum(y))/length(y))

# contigency table with counts for two discrete variables
y = rbind( c(1,2,3), c(6,5,4) )

# Dirichlet estimate of mutual information (with a=1/2)
mi.Dirichlet(y, a=1/2)

[Package entropy version 1.1.3 Index]