asia {bnlearn} | R Documentation |
Small synthetic data set from Lauritzen and Spiegelhalter (1988) about lung diseases (tuberculosis, lung cancer or bronchitis) and visits to Asia.
data(asia)
The asia
data set contains the following variables:
D
(dyspnoea), a two-level factor with levels
yes
and no
.
T
(tuberculosis), a two-level factor with levels
yes
and no
.
L
(lung cancer), a two-level factor with levels
yes
and no
.
B
(bronchitis), a two-level factor with levels
yes
and no
.
A
(visit to Asia), a two-level factor with levels
yes
and no
.
S
(smoking), a two-level factor with levels
yes
and no
.
X
(chest X-ray), a two-level factor with levels
yes
and no
.
E
(tuberculosis versus lung cancer/bronchitis), a
two-level factor with levels yes
and no
.
Standard learning algorithms are not able to recover the true
structure of the network because of the presence of a node (E
)
with conditional probabilities equal to both 0 and 1.
S. Lauritzen and D. Spiegelhalter (1988). Local computation with probabilities on graphical structures and their application to expert system. Journal of the Royal Statistics Society - B Series, 50(2), pages 157–192.
## The modelstring() of this data set is: # [A][S][T|A][L|S][B|S][D|B][E|T:L][X|E] # these are the R commands used to generate this data set. ## Not run: a = sample(c("yes", "no"), 5000, prob = c(0.01, 0.99), replace = TRUE) s = sample(c("yes", "no"), 5000, prob = c(0.50, 0.50), replace = TRUE) t = a t[t == "yes"] = sample(c("yes", "no"), length(which(t == "yes")), prob = c(0.05, 0.95), replace = TRUE) t[t == "no"] = sample(c("yes", "no"), length(which(t == "no")), prob = c(0.01, 0.99), replace = TRUE) l = s l[l == "yes"] = sample(c("yes", "no"), length(which(l == "yes")), prob = c(0.10, 0.90), replace = TRUE) l[l == "no"] = sample(c("yes", "no"), length(which(l == "no")), prob = c(0.01, 0.99), replace = TRUE) b = s b[b == "yes"] = sample(c("yes", "no"), length(which(b == "yes")), prob = c(0.60, 0.40), replace = TRUE) b[b == "no"] = sample(c("yes", "no"), length(which(b == "no")), prob = c(0.30, 0.70), replace = TRUE) e = apply(cbind(l,t), 1, paste, collapse= ":") e[e == "yes:yes"] = "yes" e[e == "yes:no"] = "yes" e[e == "no:yes"] = "yes" e[e == "no:no"] = "no" x = e x[x == "yes"] = sample(c("yes", "no"), length(which(x == "yes")), prob = c(0.98, 0.02), replace = TRUE) x[x == "no"] = sample(c("yes", "no"), length(which(x == "no")), prob = c(0.05, 0.95), replace = TRUE) d = apply(cbind(e,b), 1, paste, collapse= ":") d[d == "yes:yes"] = sample(c("yes", "no"), length(which(d == "yes:yes")), prob = c(0.90, 0.10), replace = TRUE) d[d == "yes:no"] = sample(c("yes", "no"), length(which(d == "yes:no")), prob = c(0.70, 0.30), replace = TRUE) d[d == "no:yes"] = sample(c("yes", "no"), length(which(d == "no:yes")), prob = c(0.80, 0.20), replace = TRUE) d[d == "no:no"] = sample(c("yes", "no"), length(which(d == "no:no")), prob = c(0.10, 0.90), replace = TRUE) data.frame(A = a, S = s, T = t, L = l, B = b, E = e, X = x, D = d) ## End(Not run)