grplasso {grplasso} | R Documentation |
Fits the solution of a group lasso problem for a model of type
grpl.model
.
grplasso(x, ...) ## S3 method for class 'formula': grplasso(formula, nonpen = ~ 1, data, weights = rep(1, length(y)), subset, na.action, lambda, coef.init, penscale = sqrt, model = LogReg(), center = TRUE, standardize = TRUE, control = grpl.control(), contrasts = NULL, ...) ## Default S3 method: grplasso(x, y, index, weights = rep(1, length(y)), offset = rep(0, length(y)), lambda, coef.init = rep(0, ncol(x)), penscale = sqrt, model = LogReg(), center = TRUE, standardize = TRUE, control = grpl.control(), ...)
x |
design matrix (including intercept) |
y |
response vector |
formula |
formula of the penalized variables. The response
has to be on the left hand side of ~ . |
nonpen |
formula of the nonpenalized variables. This will
be added to the formula argument above and doesn't need to have the
response on the left hand side. |
data |
data.frame containing the variables in the model. |
index |
vector which defines the grouping of the
variables. Components sharing the same
number build a group. Non-penalized coefficients are marked with
NA . |
weights |
vector of observation weights. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data contain 'NA's. |
offset |
vector of offset values; needs to have the same length as the response vector. |
lambda |
vector of penalty parameters. Optimization starts with the first component. See details below. |
coef.init |
initial vector of parameter estimates corresponding
to the first component in the vector lambda . |
penscale |
rescaling function to adjust the value of the penalty parameter to the degrees of freedom of the parameter group. See the reference below. |
model |
an object of class grpl.model implementing
the negative log-likelihood, gradient, hessian etc. See the
documentation of grpl.model for more details. |
center |
logical. If true, the columns of the design matrix will be centered (except a possible intercept column). |
standardize |
logical. If true, the design matrix will be blockwise orthonormalized such that for each block X^TX = n 1 (*after* possible centering). |
control |
options for the fitting algorithm, see
grpl.control . |
contrasts |
an optional list. See the 'contrasts.arg' of 'model.matrix.default'. |
... |
additional arguments to be passed to the functions defined
in model . |
When using grplasso.formula
, the grouping of the variables is
derived from the type of the variables: The dummy variables of a
factor will be automatically treated as a group.
The optimization process starts using the first component of
lambda
as penalty parameter λ and with starting
values defined in coef.init
for the parameter vector. Once
fitted, the next component of lambda
is considered as penalty
parameter with starting values defined as the (fitted) coefficient
vector based on the previous component of lambda
.
A grplasso
object is returned, for which coef
,
print
, plot
and predict
methods exist.
coefficients |
coefficients with respect to the original input
variables (even if standardize = TRUE is used for fitting). |
lambda |
vector of lambda values where coefficients were calculated. |
index |
grouping index vector. |
Lukas Meier, meier@stat.math.ethz.ch
Lukas Meier, Sara van de Geer and Peter B"uhlmann (2008), The Group Lasso for Logistic Regression, Journal of the Royal Statistical Society, 70 (1), 53 - 71, see also http://stat.ethz.ch/~meier/logistic-grouplasso.php
## Use the Logistic Group Lasso on the splice data set data(splice) ## Define a list with the contrasts of the factors contr <- rep(list("contr.sum"), ncol(splice) - 1) names(contr) <- names(splice)[-1] ## Fit a logistic model fit.splice <- grplasso(y ~ ., data = splice, model = LogReg(), lambda = 20, contrasts = contr, center = TRUE, standardize = TRUE) ## Perform the Logistic Group Lasso on a random dataset set.seed(79) n <- 50 ## observations p <- 4 ## variables ## First variable (intercept) not penalized, two groups having 2 degrees ## of freedom each index <- c(NA, 2, 2, 3, 3) ## Create a random design matrix, including the intercept (first column) x <- cbind(1, matrix(rnorm(p * n), nrow = n)) colnames(x) <- c("Intercept", paste("X", 1:4, sep = "")) par <- c(0, 2.1, -1.8, 0, 0) prob <- 1 / (1 + exp(-x %*% par)) mean(pmin(prob, 1 - prob)) ## Bayes risk y <- rbinom(n, size = 1, prob = prob) ## binary response vector ## Use a multiplicative grid for the penalty parameter lambda, starting ## at the maximal lambda value lambda <- lambdamax(x, y = y, index = index, penscale = sqrt, model = LogReg()) * 0.5^(0:5) ## Fit the solution path on the lambda grid fit <- grplasso(x, y = y, index = index, lambda = lambda, model = LogReg(), penscale = sqrt, control = grpl.control(update.hess = "lambda", trace = 0)) ## Plot coefficient paths plot(fit)