pcSelect.presel {pcalg}R Documentation

PC-Select preselection: Estimate subgraph around a response variable using preselection

Description

This function uses pcSelect to preselect some covariates and then runs pcSelect again on the reduced data set.

Usage

pcSelect.presel(y,dm, alpha, alphapre, corMethod = "standard", verbose = 0, directed=FALSE)

Arguments

y Response Vector (length(y)=nrow(dm))
dm Data matrix (rows: samples, cols: nodes)
alpha Significance level of individual partial correlation tests
alphapre Significance level for pcSelect in preselection
corMethod "standard" or "Qn" for standard or robust correlation estimation
verbose 0-no output, 1-small output, 2-details (using 1 and 2 makes the function very much slower)
directed Boolean; should the output graph be directed?

Details

This function basically applies pcAlgo on the data matrix obtained by joining y and dm. Since the output is not concerned with the edges found within the columns of dm, the algorithm is adapted accordingly. Therefore, the runtime and the ability to deal with large datasets is typically increased quite a lot.

First, pcSelect is run using alphapre. Then, only the important variables are kept and pcSelect is run on them again.

Value

pcs A boolean vector indicating which column of dm is associated with y
zMin The minimal z-values when testing partial correlations between y and each column of dm. The larger the number, the more consistent is the edge with the data.
Xnew Preselected Variables.

Author(s)

Philipp Ruetimann.

References

P. Spirtes, C. Glymour and R. Scheines (2000) Causation, Prediction, and Search, 2nd edition, The MIT Press.

See Also

pcAlgo which is the more general version of this function.

Examples

p <- 10
## generate and draw random DAG :
set.seed(101)
myDAG <- randomDAG(p, prob = 0.2)
plot(myDAG, main = "randomDAG(10, prob = 0.2)")

## generate 1000 samples of DAG using standard normal error distribution
n <- 1000
d.mat <- rmvDAG(n, myDAG, errDist = "normal")

## let's pretend that the 10th column is the response and the first 9
## columns are explanatory variable. Which of the first 9 variables
## "cause" the tenth variable?
y <- d.mat[,10]
dm <- d.mat[,-10]
res <- pcSelect.presel(d.mat[,10],d.mat[,-10],alpha=0.05,alphapre=0.6)

[Package pcalg version 0.1-8 Index]