blasso {monomvn} | R Documentation |
Inference for ordinary least squares, lasso and ridge regression models by (Gibbs) sampling from the Bayesian posterior distribution, augmented with Reversible Jump for model selection
bridge(X, y, T = 1000, thin = NULL, RJ = TRUE, M = NULL, beta = NULL, lambda2 = 1, s2 = 1, mprior = 0, rd = NULL, ab = NULL, rao.s2 = TRUE, normalize = TRUE, verb = 1) blasso(X, y, T = 1000, thin = NULL, RJ = TRUE, M = NULL, beta = NULL, lambda2 = 1, s2 = 1, ridge = FALSE, mprior = 0, rd = NULL, ab = NULL, rao.s2 = TRUE, normalize = TRUE, verb = 1)
X |
data.frame , matrix , or vector of inputs X |
y |
vector of output responses y of length equal to the
leading dimension (rows) of X , i.e., length(y) == nrow(X) |
T |
total number of MCMC samples to be collected |
thin |
number of MCMC samples to skip before a sample is
collected (via thinning). If NULL (default), then
thin is determined based on the regression model implied
by RJ , lambda2 , and ncol(X) |
RJ |
if TRUE then model selection on the columns of the
design matrix (and thus the parameter beta in the model) is
performed by Reversible Jump (RJ) MCMC. The initial model is
specified by the beta input, described below, and the maximal
number of covariates in the model is specified by M |
M |
the maximal number of allowed covariates (columns of
X ) in the model. If input lambda2 > 0 then
M <= ncol(X) is allowed. Otherwise
M <= min(ncol(X), length(y)-1) , which is default value
when a NULL argument is given |
beta |
initial setting of the regression coefficients. Any
zero-components will imply that the corresponding covariate (column
of X ) is not in the initial model. When input RJ =
FALSE (no RJ) and lambda2 > 0 (use lasso) then no
components are allowed to be exactly zero. The default setting is
therefore contextual; see below for details |
lambda2 |
square of the initial lasso penalty parameter. If zero, then least squares regressions are used |
s2 |
initial variance parameter |
ridge |
specifies if ridge regression should be done instead
of the lasso; only meaningful when lambda2 > 0 |
mprior |
prior on the number of non-zero regression coefficients
(and therefore covariates) m in the model. The default
(mprior = 0 ) encodes the uniform prior on 0 <= m <= M .
A scalar value 0 < mprior < 1 implies a Binomial prior
Bin(m|n=M,p=mprior) . A 2-vector mprior=c(g,h)
of positive values g and h represents
gives Bin(m|n=M,p) prior where p~Beta(g,h) |
rd |
=c(r, delta) , the alpha (shape) parameter and
beta (rate) parameter to the gamma distribution prior
G(r,delta) for the lambda2 parameter under
the lasso model; or, the alpha (shape) parameter and
beta (scale) parameter to the
inverse-gamma distribution IG(r/2, delta/2) prior for
the lambda2
parameter under the ridge regression model. A default of NULL
generates appropriate non-informative values depending on the
nature of the regression. See the details below for information
on the special settings of this paper for ridge regression |
ab |
=c(a, b) , the alpha (shape)
parameter and the beta (scale) parameter for the
inverse-gamma distribution prior IG(a,b) for the variance parameter
s2 . A default of NULL generates appropriate
non-informative values depending on the nature of the regression |
rao.s2 |
indicates whether Rao-Blackwellized samples for
s^2 should be used (default TRUE ); see
below for more details |
normalize |
if TRUE , each variable is standardized to have unit
L2-norm, otherwise it is left alone; default is TRUE |
verb |
verbosity level; currently only verb = 0 and
verb = 1 are supported |
The Bayesian lasso model and Gibbs Sampling algorithm is described
in detail in Park & Casella (2008). The algorithm implemented
by this function is identical to that described therein, with
the exception of an added “option” to use a Rao-Blackwellized sample
of s^2 (with beta integrated out)
for improved mixing, and the model selections by RJ described below.
When input argument lambda2 = 0
is
supplied, the model is a simple hierarchical linear model where
(beta,s2) is given a Jeffrey's prior
Specifying RJ = TRUE
causes Bayesian model selection and
averaging to commence for choosing which of the columns of the
design matrix X
(and thus parameters beta
) should be
included in the model. The zero-components of the beta
input
specify which columns are in the initial model, and
M
specifies the maximal number of columns.
The RJ mechanism implemented here for the Bayesian lasso model selection differs from the one described by Hans (2008), which is based on an idea from Geweke (1996). Those methods require departing from the Park & Casella (2008) latent-variable model and requires sampling from each conditional beta[i] | beta[-i], ... for all i, since a mixture prior with a point-mass at zero is placed on each beta[i]. Out implementation here requires no such special prior and retains the joint sampling from the full beta vector of non-zero entries, which we believe yields better mixing in the Markov chain. RJ proposals to increase/decrease the number of non-zero entries does proceed component-wise, but the acceptance rates are high due due to marginalized between-model moves (Troughton & Godsill, 1997).
Bayesian ridge regression is implemented as a special case via the
bridge
function. This essentially calls blasso
with ridge = TRUE
and RJ = FALSE
(by default).
A default setting of rd = c(0,0)
is implied by rd =
NULL
, giving the Jeffery's prior for the penalty parameter
lambda^2 unless ncol(X) >= length(y)
in
which case the proper specification of rd = c(5,10)
is used
instead
blasso
returns an object of class "blasso"
, which is a
list
containing a copy of all of the input arguments as well as
of the components listed below.
call |
a copy of the function call as used |
mu |
a vector of T samples of the (un-penalized)
“intercept” parameter |
beta |
a T*ncol(X) matrix of T samples from
the (penalized) regression coefficients |
m |
the number of non-zero entries in each vector of T
samples of beta |
s2 |
a vector of T samples of the variance parameter |
lambda2 |
a vector of T samples of the penalty
parameter |
tau2i |
a T*ncol(X) matrix of T samples from
the (latent) inverse diagonal of the prior covariance matrix for
beta , obtained for Lasso regressions |
pi |
a vector of T samples of the Binomial proportion
p that was given a Beta prior, as described above for the
2-vector version of the mprior input |
Whenever ncol(X) >= nrow(X)
it must be that either RJ = TRUE
with M <= nrow(X)-1
(the default) or that the lasso is turned
on with lambda2 > 0
. Otherwise the regression problem is ill-posed.
Since the starting values are considered to be first sample (of
T
), the total number of (new) samples obtained by Gibbs
Sampling will be T-1
Robert B. Gramacy bobby@statslab.cam.ac.uk
Park, T., Casella, G. (2008). The Bayesian Lasso.
Journal of the American Statistical Association, Volume 103,
Number 482, June 2008 , pp. 681-686(6)
http://www.stat.ufl.edu/~casella/Papers/Lasso.pdf
Chris Hans. (2008). Bayesian Lasso regression.
Technical Report No. 810, Department of Statistics,
The Ohio State University, Columbus, OH 43210.
http://www.stat.osu.edu/~hans/Papers/blasso.pdf
Geweke, J. (1996). Variable selection and model comparison in regression. In Bayesian Statistics 5. Editors: J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith, 609-620. Oxford Press.
Paul T. Troughton and Simon J. Godsill (1997). A reversible jump sampler for autoregressive time series, employing full conditionals to achieve efficient model space moves. Technical Report CUED/F-INFENG/TR.304, Cambridge University Engineering Department.
http://www.statslab.cam.ac.uk/~bobby/monomvn.html
lm
,
lars
in the lars package,
regress
,
lm.ridge
in the MASS package
## following the lars diabetes example data(diabetes) attach(diabetes) ## Ordinary Least Squares regression reg.ols <- regress(x, y) ## Lasso regression reg.las <- regress(x, y, method="lasso") ## Bayesian Lasso regression reg.blas <- blasso(x, y) ## summarize the beta (regression coefficients) estimates plot(reg.blas, burnin=200) points(drop(reg.las$b), col=2, pch=20) points(drop(reg.ols$b), col=3, pch=18) legend("topleft", c("blasso-map", "lasso", "lsr"), col=c(2,2,3), pch=c(21,20,18)) ## plot the size of different models visited plot(reg.blas, burnin=200, which="m") ## get the summary s <- summary(reg.blas, burnin=200) ## calculate the probability that each beta coef != zero s$bn0 ## summarize s2 plot(reg.blas, burnin=200, which="s2") s$s2 ## summarize lambda2 plot(reg.blas, burnin=200, which="lambda2") s$lambda2 ## clean up detach(diabetes) ## ## a big-p small-n example ## n <- 25; m <- 51 xmuS <- randmvn(n, m) X <- xmuS$x[,1:(m-1)] Y <- drop(xmuS$x[,m]) obl <- blasso(X, Y, verb=0) ## plot summary of the model order plot(obl, burnin=10, which="m") ## fit a standard lasso model oml <- regress(X, Y, method="lasso") ## compare via RMSE, most often blasso will win beta <- xmuS$S[m,-m] sqrt(mean((apply(obl$beta, 2, mean) - beta)^2)) sqrt(mean((oml$b[-1] - beta)^2)) ## now try both Bayesian & ML ridge regression obr <- bridge(X, Y, verb=0) omr <- regress(X, Y, method="ridge") sqrt(mean((apply(obr$beta[-c(1:200),], 2, mean) - beta)^2)) sqrt(mean((omr$b[-1] - beta)^2))