step.plr {stepPlr} | R Documentation |
This function fits a series of L2 penalized logistic regression models selecting variables through the forward stepwise selection procedure.
step.plr(x, y, weights = rep(1,length(y)), fix.subset = NULL, level = NULL, lambda = 1e-4, cp = "bic", max.terms = 5, type = c("both", "forward"), trace = FALSE)
x |
matrix of features |
y |
binary response |
weights |
an optional vector of weights for observations |
fix.subset |
a vector of indices for the variables that are forced to be in the model |
level |
a list of length ncol(x). The j-th element corresponds to
the j-th column of x. If the j-th column of x is
discrete, level[[j]] is the set of levels for the
categorical factor. If the j-th column of x is continuous,
level[[j]] = NULL. level is automatically
generated in the function; however, if any levels of the
categorical factors are not observed, but still need to be included
in the model, then the user must provide the complete sets of the
levels through level. If a numeric column needs to be
considered discrete, it can be done by manually providing
level as well.
|
lambda |
regularization parameter for the L2 norm of the
coefficients. The minimizing criterion in plr is
-log-likelihood+λ*|β|^2. Default is
lambda=1e-4.
|
cp |
complexity parameter to be used when computing the
score. score=deviance+cp*df. If cp="aic" or
cp="bic", these are converted to cp=2 and
cp=log(sample size), respectively. Default is
cp="bic".
|
max.terms |
the maximum number of terms to be added in the forward selection
procedure. Default is max.terms=5.
|
type |
If type="both", the forward selection is followed by a
backward deletion. If type="forward", only a forward
selection is done. Default is "both".
|
trace |
If TRUE, the variable selection procedure prints out its
progress.
|
This function implements an L2 penalized logistic regression along with the stepwise variable selection procedure, as described in "Penalized Logistic Regression for Detecting Gene Interactions (2006)" by Park and Hastie.
If type="forward",
max.terms
terms are sequentially
added to the model, and the model that minimizes score
is
selected as the optimal fit. If type="both",
a backward
deletion is done in addition, which provides a series of models with a
different combination of the selected terms. The optimal model
minimizing score
is chosen from the second list.
We thank Michael Saunders of SOL, Stanford University for providing the solver used for the convex optimization in this function.
A stepplr
object is returned. anova, predict, print,
and
summary
functions can be applied.
fit |
a plr object for the optimal model selected
|
action |
a list that stores the selection order of the terms in the optimal model. |
action.name |
a list of the names of the sequentially added terms - in the same
order as in action
|
deviance |
deviance of the fitted model |
df |
residual degrees of freedom of the fitted model |
score |
deviance + cp*df, where df is the model degrees of freedom |
group |
a vector of the counts for the dummy variables, to be used in
predict.stepplr
|
y |
response variable used |
weight |
weights used |
fix.subset |
fix.subset used |
level |
level used |
lambda |
lambda used |
cp |
complexity parameter used when computing the score |
type |
type used |
xnames |
column names of x
|
Mee Young Park and Trevor Hastie
Mee Young Park and Trevor Hastie (2006) Penalized Logistic Regression for Detecting Gene Interactions - available at the authors' websites, http://stat.stanford.edu/~mypark or http://stat.stanford.edu/~hastie/pub.htm.
cv.step.plr, plr, predict.stepplr
n <- 100 p <- 3 z <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n) x <- data.frame(x1=factor(z[ ,1]),x2=factor(z[ ,2]),x3=factor(z[ ,3])) y <- sample(c(0,1),n,replace=TRUE) fit <- step.plr(x,y) # 'level' is automatically generated. Check 'fit$level'. p <- 5 x <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n) x <- cbind(rnorm(n),x) y <- sample(c(0,1),n,replace=TRUE) level <- vector("list",length=6) for (i in 2:6) level[[i]] <- seq(3) fit1 <- step.plr(x,y,level=level,cp="aic") fit2 <- step.plr(x,y,level=level,cp=4) fit3 <- step.plr(x,y,level=level,type="forward") fit4 <- step.plr(x,y,level=level,max.terms=10) # This is an example in which 'level' was input manually. # level[[1]] should be either 'NULL' or 'NA' since the first factor is continuous.