bnlearn-package {bnlearn} | R Documentation |
Bayesian network learning via constraint-based, score-based and hybrid algorithms.
Package: | bnlearn |
Type: | Package |
Version: | 1.7 |
Date: | 2009-10-25 |
License: | GPLv2 or later
|
This package implements some algorithms for learning the structure of Bayesian networks.
Constraint-based algorithms, also known as conditional independence learners, are all optimized derivatives of the Inductive Causation algorithm (Verma and Pearl, 1991). These algorithms use conditional independence tests to detect the Markov blankets of the variables, which in turn are used to compute the structure of the Bayesian network.
Score-based learning algorithms are general purpose heuristic optimization algorithms which rank network structures with respect to a goodness-of-fit score.
Hybrid algorithms combine aspects of both constraint-based and score-based algorithms, as they use conditional independence tests (usually to reduce the search space) and network scores (to find the optimal network in the reduced space) at the same time.
gs
): based on the Grow-Shrink
Markov Blanket, the first (and simplest) Markov blanket detection algorithm
(Margaritis, 2003) used in a structure learning algorithm.
iamb
): based on the
Markov blanket detection algorithm of the same name (Tsamardinos et
al., 2003), which is based on a two-phase selection scheme (a forward
selection followed by an attempt to remove false positives).
fast.iamb
): a
variant of IAMB which uses speculative stepwise forward selection to
reduce the number of conditional independence tests (Yaramakala and
Margaritis, 2005).
inter.iamb
):
another variant of IAMB which uses forward stepwise selection (Tsamardinos
et al., 2003) to avoid false positives in the Markov blanket detection
phase.
This package includes three implementations of each algorithm:
optimized
parameter is set to TRUE
), which uses backtracking to roughly
halve the number of independence tests.
optimized
parameter is set to FALSE
) which is better at uncovering
possible erratic behaviour of the statistical tests.
makeCluster
function from the snow
package. See snow integration for a sample usage.
The computational complexity of these algorithms is polynomial in the number of tests, usually O(N^2) (O(N^4) in the worst case scenario). Execution time scales linearly with the size of the data set.
hc
): a hill climbing
greedy search on the space of the directed graphs. The optimized
implementation uses score caching, score decomposability and score
equivalence to reduce the number of duplicated tests. Random restart
with a configurable number of perturbing operations is implemented.
mmhc
): a hybrid
algorithm which combines the Max-Min Parents and Children algorithm
(to restrict the search space) and the Hill-Climbing algorithm
(to find the optimal network structure in the restricted space).
rshc
): a more
general implementation of the Max-Min Hill-Climbing, which can use
any combination of constraint-based and score-based algorithms.
These algorithms learn the structure of the undirected graph underlying the Bayesian network, which is known as the skeleton of the network or the correlation graph. Therefore all the arcs are undirected, and no attempt is made to detect their orientation. They are often used in hybrid learning algorithms.
mmpc
): a
forward selection technique for neighbourhood detection based on
the maximization of the minimum association measure observed with
any subset of the nodes selected in the previous iterations
(Tsamardinos, Brown and Aliferis, 2006).
All these algorithms have three implementations (unoptimized, optimized and cluster-aware) like other constraint-based algorithms.
The conditional independence tests used in constraint-based algorithms in practice are statistical tests on the data set. Available tests (and the respective labels) are:
mi
) and the Monte Carlo permutation test (mc-mi
)
are implemented.
fmi
): a variant of the
mutual information which is set to zero when there aren't at least
five data per parameter.
x2
) and the Monte Carlo
permutation test (mc-x2
) are implemented.
aict
): an experimental
AIC-based independence test, computed comparing the mutual information
and the expected information gain.
cor
) and the Monte Carlo permutation test
(mc-cor
) are implemented.
pcalg
package on CRAN). Both the asymptotic
normal (zf
) and the Monte Carlo permutation test (mc-zf
)
are implemented.
mi-g
) and the Monte Carlo permutation test (mc-mi-g
)
are implemented.
Available scores (and the respective labels) are:
loglik
) score,
which is equivalent to the entropy measure used in Weka.
aic
).
bic
),
which is equivalent to the Minimum Description Length (MDL)
and is also known as Schwarz Information Criterion.
bde
), a score equivalent Dirichlet posterior density.
k2
), a Dirichlet
posterior density (not score equivalent).
loglik-g
)
score.
aic-g
).
bic-g
).
bge
).
All learning algorithms support arc whitelisting and blacklisting:
Any arc whitelisted and blacklisted at the same time is assumed to be whitelisted, and is thus removed from the blacklist.
Optimized implementations of constraint-based algorithms rely heavily on backtracking to reduce the number of tests needed by the learning procedure. This approach may hide errors either in the Markov blanket or the neighbourhood detection phase in some particular cases, such as when hidden variables are present or there are external (logical) constraints on the interactions between the variables.
On the other hand in the unoptimized implementations the Markov blanket and neighbour detection of each node is completely independent from the rest of the learning process. Thus it may happen that the Markov blanket or the neighbourhoods are not symmetric (i.e. A is in the Markov blanket of B but not vice versa), or that some arc directions conflict with each other.
The strict
parameter enables some measure of error
correction, which may help to retrieve a good model
even when the learning process would otherwise fail:
strict
is set to TRUE
, every error
stops the learning process and results in an error message.
strict
is set to FALSE
:
Marco Scutari
Department of Statistical Sciences
University of Padova
Maintainer: Marco Scutari marco.scutari@gmail.com
(a BibTeX file with all the references cited throughout this manual is present in the ‘bibtex’ directory of this package)
Agresti A (2002). Categorical Data Analysis. Wiley Series in Probability and Statistics. Wiley-Interscience, 2nd edition.
Korb K, Nicholson AE (2003). Bayesian Artificial Intelligence. Chapman & Hall/CRC.
Margaritis D (2003). Learning Bayesian Network Model Structure from Data. Ph.D. thesis, School of Computer Science, Carnegie-Mellon University, Pittsburgh, PA. Available as Technical Report CMU-CS-03-153.
Pearl J (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann.
Tsamardinos I, Aliferis CF, Statnikov A (2003). "Algorithms for Large Scale Markov Blanket Discovery". In "Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society Conference", pp. 376-381. AAAI Press.
Tsamardinos I, Brown LE, Aliferis CF (2006). "The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm". Machine Learning, 65(1), 31-78.
Yaramakala S, Margaritis D (2005). "Speculative Markov Blanket Discovery for Optimal Feature Selection". In "ICDM '05: Proceedings of the Fifth IEEE International Conference on Data Mining", pp. 809-812. IEEE Computer Society.
library(bnlearn) data(learning.test) ## Simple learning # first try the Grow-Shrink algorithm res = gs(learning.test) # plot the network structure. plot(res) # now try the Incremental Association algorithm. res2 = iamb(learning.test) # plot the new network structure. plot(res2) # the network structures seem to be identical, don't they? compare(res, res2) # [1] TRUE # how many tests each of the two algorithms used? res$learning$ntests # [1] 41 res2$learning$ntests # [1] 50 # and the unoptimized implementation of these algorithms? ## Not run: gs(learning.test, optimized = FALSE)$learning$ntests # [1] 90 ## Not run: iamb(learning.test, optimized = FALSE)$learning$ntests # [1] 116 ## Greedy search res = hc(learning.test) plot(res) ## Another simple example (Gaussian data) data(gaussian.test) # first try the Grow-Shrink algorithm res = gs(gaussian.test) plot(res) ## Blacklist and whitelist use # the arc B - F should not be there? blacklist = data.frame(from = c("B", "F"), to = c("F", "B")) blacklist # from to # 1 B F # 2 F B res3 = gs(learning.test, blacklist = blacklist) plot(res3) # force E - F direction (E -> F). whitelist = data.frame(from = c("E"), to = c("F")) whitelist # from to # 1 E F res4 = gs(learning.test, whitelist = whitelist) plot(res4) # use both blacklist and whitelist. res5 = gs(learning.test, whitelist = whitelist, blacklist = blacklist) plot(res5) ## Debugging # use the debugging mode to see the learning algorithms # in action. res = gs(learning.test, debug = TRUE) res = hc(learning.test, debug = TRUE) # log the learning process for future reference. ## Not run: sink(file = "learning-log.txt") res = gs(learning.test, debug = TRUE) sink() ## End(Not run) # if something seems wrong, try the unoptimized version # in strict mode (inconsistencies trigger errors): ## Not run: res = gs(learning.test, optimized = FALSE, strict = TRUE, debug = TRUE) ## End(Not run) # or disable strict mode to let the algorithm fix errors on the fly: ## Not run: res = gs(learning.test, optimized = FALSE, strict = FALSE, debug = TRUE) ## End(Not run)