pcAlgo {pcalg} | R Documentation |
Estimate the underlying graph (ugraph
or
“skeleton” (structure) of a DAG (Directed Acyclic
Graph) from data using the PC-algorithm. Alternatively, one can
directly infer the equivalence class of the underlying structure,
which is uniquely represented by its CPDAG (Completed
Partially Directed Acyclic Graph). For
continuous or discrete data.
pcAlgo(dm = NA, C = NA, n=NA, alpha, corMethod = "standard", verbose = FALSE, directed=FALSE, G=NULL, datatype='continuous',NAdelete=TRUE, m.max=Inf,u2pd="rand")
dm |
data matrix; rows correspond to samples, cols correspond to nodes. |
C |
correlation matrix; this is an alternative for specifying the data matrix. |
n |
sample size; this is only needed if the data matrix is not provided. |
alpha |
significance level for the individual partial correlation tests. |
corMethod |
a character string speciyfing the method for (partial) correlation estimation. "standard", "QnStable", "Qn" or "ogkQn" for standard and robust (based on the Qn scale estimator without and with OGK) correlation estimation. For robust estimation, we recommend QnStable. |
verbose |
0-no output, 1-small output, 2-details;using 1 and 2 makes the function very much slower |
directed |
If FALSE , the underlying skeleton is computed;
if TRUE , the underlying CPDAG is computed |
G |
the adjacency matrix of the graph from which the algorithm should start (logical) |
datatype |
distinguish between discrete and continuous data |
NAdelete |
delete edge if pval=NA (for discrete data) |
m.max |
maximal size of conditioning set |
u2pd |
Function used for converting skeleton to cpdag. "rand" (use udag2pdag); "relaxed" (use udag2pdagRelaxed); "retry" (use udag2pdagSpecial) |
The algorithm starts with a complete undirected graph or with G (if
given). The following text explains the continuous case. For the
discrete case, replace (partial) correlation tests by (conditional)
independence tests. In a first sweep, an edge ij is kept only if
H_0: Cor(X_i,X_j) = 0 can be
rejected on significance level alpha
. All ordered pairs ij of
nodes of the
resulting graph are then swept again. An edge ij is kept only if
H_0: Cor(X_i,X_j|X_k)=0 can be rejected for all neighbours k of
i in the current graph. Again, the remaining egdes are swept. This
time, an ordered pair (edge) ij is
kept only if H_0: Cor(X_i,X_j|X_a,X_b)=0 can be rejected for all
subsets of size two (a,b) of the neighbours of i in the
remaining graph. In the next
step, the remaining edges are tested using all subsets of size three,
then of size four and so on. The algorithm stops when the largest
neighbourhood is smaller than the size of the conditioning sets.
The partial correlations are
computed recursively or via matrix inversion from the correlation matrix,
which are computed by the
specified method (corMethod
). The partial correlation tests
are based on Fisher's z-transformation. For more details on the
methods for computing the correlations see mcor
.
An object of class
"pcAlgo"
(see
pcAlog-class
) containing an undirected graph
(object of class
"graph"
, see
graph-class
from the package graph)
(without weigths) as estimate of the skeleton or the CPDAG of the
underlying DAG.
Markus Kalisch (kalisch@stat.math.ethz.ch) and Martin Maechler.
P. Spirtes, C. Glymour and R. Scheines (2000) Causation, Prediction, and Search, 2nd edition, The MIT Press.
Kalisch M. and P. B"uhlmann (2007) Estimating high-dimensional directed acyclic graphs with the PC-algorithm; JMLR, Vol. 8, 613-636, 2007.
randomDAG
for generating a random DAG;
rmvDAG
for generating data according to a DAG;
compareGraphs
for comparing undirected graphs in terms of
TPR, FPR and TDR. Further, randomGraph
(in
package graph) for other random graph
models. udag2pdag
for converting the skeleton to a CPDAG.
p <- 10 ## generate and draw random DAG : set.seed(101) class(myDAG <- randomDAG(p, prob = 0.2)) plot(myDAG, main = "randomDAG(10, prob = 0.2)") ## generate 1000 samples of DAG using standard normal error distribution n <- 1000 d.mat <- rmvDAG(n, myDAG, errDist = "normal") ## estimate skeleton and CPDAG of given data resU <- pcAlgo(d.mat, alpha = 0.05, corMethod = "standard") resU plot(resU,zvalue.lwd=TRUE)# << using the plot() method for 'pcAlgo' objects! str(resU, max = 2) (c.g <- compareGraphs(myDAG, resU@graph)) ## CPDAG resD <- pcAlgo(d.mat, alpha = 0.05, corMethod = "standard",directed=TRUE) plot(resD,zvalue.lwd=TRUE) ## plot the original DAG, the estimated skeleton and the estimated CPDAG: op <- par(mfrow=c(3,1)) plot(myDAG, main = "original (random)DAG") plot(resU@graph, main = "estimated skeleton from pcAlgo(<simulated, n = 1000>)") plot(resD@graph,main="estimated CPDAG from pcAlgo(<simulated, n = 1000>)") par(op) ## generate data containing severe outliers d.mixmat <- rmvDAG(n, myDAG, errDist = "mix", mix=0.3) ## Compute "classical" and robust estimate of skeleton : pcC <- pcAlgo(d.mixmat, alpha = 0.01, corMeth = "standard") pcR <- pcAlgo(d.mixmat, alpha = 0.01, corMeth = "Qn") str(pcR, max = 2) (c.Cg <- compareGraphs(myDAG, pcC@graph)) (c.Rg <- compareGraphs(myDAG, pcR@graph))#-> (.201 0 1) much better op <- par(mfrow=c(3,1)) plot(myDAG, main = "original (random)DAG") plot(pcC) plot(pcR,zvalue.lwd=TRUE,lwd.max=7) par(op)