bnlearn-package {bnlearn}R Documentation

Bayesian network structure learning.

Description

Bayesian network learning via constraint-based and score-based algorithms.

Details

Package:

bnlearn
Type: Package
Version: 1.3
Date: 2009-03-28
License: GPLv2 or later

This package implements some constraint-based algorithms for learning the structure of Bayesian networks. Also known as conditional independence learners, they are all optimized derivatives of the Inductive Causation algorithm (Verma and Pearl, 1991).

These algorithms differ in the way they detect the Markov blankets of the variables, which in turn are used to compute the structure of the Bayesian network. Proofs of correctness are present in the respective papers.

A score-based learning algorithm (greedy search via hill-climbing) is implemented as well for comparison purposes.

Available constraint-based learning algorithms

This package includes three implementations of each algorithm:

The computational complexity of these algorithms is polynomial in the number of tests, usually O(N^2) (O(N^4) in the worst case scenario). The execution time scales linearly with the size of the data set.

Available score-based learning algorithms

Available (conditional) independence tests

The conditional independence tests used in constraint-based algorithms in practice are statistical tests on the data set. Available tests (and the respective labels) are:

Available network scores

Available scores (and the respective labels) are:

Whitelist and blacklist support

All learning algorithms support arc whitelisting and blacklisting:

Any arc whitelisted and blacklisted at the same time is assumed to be whitelisted, and is thus removed from the blacklist.

Error detection and correction: the strict mode

Optimized implementations of the constraint-based algorithms rely heavily on backtracking to reduce the number of tests needed by the learning procedure. This approach may hide errors either in the Markov blanket or the neighbourhood detection phase in some particular cases, such as when hidden variables are present or there are external (logical) constraints on the interactions between the variables.

On the other hand in the unoptimized implementations the Markov blanket and neighbour detection of each node is completely independent from the rest of the learning process. Thus it may happen that the Markov blanket or the neighbourhoods are not symmetric (i.e. A is in the Markov blanket of B but not vice versa), or that some arc directions conflict with each other.

The strict parameter enables some measure of error correction, which may help to retrieve a good model even when the learning process would otherwise fail:

Author(s)

Marco Scutari
Department of Statistical Sciences
University of Padova

Maintainer: Marco Scutari marco.scutari@gmail.com

References

A. Agresti. Categorical Data Analysis. John Wiley & Sons, Inc., 2002.

K. Korb and A. Nicholson. Bayesian artificial intelligence. Chapman and Hall, 2004.

D. Margaritis. Learning Bayesian Network Model Structure from Data. PhD thesis, School of Computer Science, Carnegie-Mellon University, Pittsburgh, PA, May 2003. Available as Technical Report CMU-CS-03-153.

J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., 1988.

I. Tsamardinos, C. F. Aliferis, and A. Statnikov. Algorithms for large scale Markov blanket discovery. In Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society Conference, pages 376-381. AAAI Press, 2003.

I. Tsamardinos, C. F. Aliferis, A. Statnikov. Time and sample efficient discovery of Markov blankets and direct causal relations. Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining KDD, pages 673-8, 2003.

I. Tsamardinos, L. E. Brown, C. Aliferis. The max-min hill-climbing Bayesian network learning algorithm. Machine Learning, 65(1), pages 31-78. Kluwer Academic Publishers, 2006.

S. Yaramakala, D. Margaritis. Speculative Markov Blanket Discovery for Optimal Feature Selection. In Proceedings of the Fifth IEEE International Conference on Data Mining, pages 809-812. IEEE Computer Society, 2005.

Examples

library(bnlearn)
data(learning.test)

## Simple learning
# first try the Grow-Shrink algorithm
res = gs(learning.test)
# plot the network structure.
plot(res)
# now try the Incremental Association algorithm.
res2 = iamb(learning.test)
# plot the new network structure.
plot(res2)
# the network structures seem to be identical, don't they?
compare(res, res2)
# [1] TRUE
# how many tests each of the two algorithms used?
res$learning$ntests
# [1] 41
res2$learning$ntests
# [1] 50
# and the unoptimized implementation of these algorithms?
## Not run: gs(learning.test, optimized = FALSE)$learning$ntests
# [1] 90
## Not run: iamb(learning.test, optimized = FALSE)$learning$ntests
# [1] 116

## Greedy search
res = hc(learning.test)
plot(res)

## Another simple example (Gaussian data)
data(gaussian.test)
# first try the Grow-Shrink algorithm
res = gs(gaussian.test)
plot(res)

## Blacklist and whitelist use
# the arc B - F should not be there?
blacklist = data.frame(from = c("B", "F"), to = c("F", "B"))
blacklist
#   from to
# 1    B  F
# 2    F  B
res3 = gs(learning.test, blacklist = blacklist)
plot(res3)
# force E - F direction (E -> F).
whitelist = data.frame(from = c("E"), to = c("F"))
whitelist
#   from to
# 1    E  F
res4 = gs(learning.test, whitelist = whitelist)
plot(res4)
# use both blacklist and whitelist.
res5 = gs(learning.test, whitelist = whitelist, blacklist = blacklist)
plot(res5)

## Debugging
# use the debugging mode to see the learning algorithms
# in action.
res = gs(learning.test, debug = TRUE)
res = hc(learning.test, debug = TRUE)
# log the learning process for future reference.
## Not run: 
sink(file = "learning-log.txt")
res = gs(learning.test, debug = TRUE)
sink()
## End(Not run)
# if something seems wrong, try the unoptimized version
# in strict mode (inconsistencies trigger errors):
## Not run: 
res = gs(learning.test, optimized = FALSE, strict = TRUE, debug = TRUE)
## End(Not run)
# or disable strict mode to let the algorithm fix errors on the fly:
## Not run: 
res = gs(learning.test, optimized = FALSE, strict = FALSE, debug = TRUE)
## End(Not run)


[Package bnlearn version 1.3 Index]