blasso {monomvn} | R Documentation |
Ordinary least squares and Lasso regression by sampling from the Bayesian posterior distribution via Gibbs Sampling and Reversible Jump for model selection
blasso(X, y, T = 100, thin = 10, RJ = TRUE, M = NULL, beta = NULL, lambda2 = 1, s2 = 1, tau2i = NULL, rd = c(2,0.1), ab = NULL, rao.s2 = TRUE, normalize = TRUE, verb = 1)
X |
data.frame , matrix , or vector of inputs X |
y |
vector of output responses y of length equal to the
leading dimension (rows) of X , i.e., length(y) == nrow(X) |
T |
Total number of MCMC samples to be collected |
thin |
number of MCMC samples to skip before a sample is collected (via thinning) |
RJ |
if TRUE then model selection on the columns of the
design matrix (and thus the parameter beta in the model) is
performed by reversible jump (RJ) MCMC. The initial model is
specified by the beta input, described below, and the maximal
number of covariates in the model is specified by M |
M |
the maximal number of allowed covariates (columns of
X ) in the model. If input lambda2 > 0 then M
as large as ncol(X) is allowed. Otherwise M must be
<= min(ncol(X), length(y)-1) , its default value when argument
NULL is given |
beta |
Initial setting of the regression coefficients. Any
zero-components will imply the corresponding covariate (column
of X ) is not in the initial model. When input RJ =
FALSE (no RJ) and lambda2 > 0 (use lasso) then no
components are allowed to be exactly zero. The default setting is
therefore contextual. See below for details |
lambda2 |
square of the initial lasso penalty parameter. If zero, the least squares regressions are used |
s2 |
initial variance parameter |
tau2i |
initial vector of lasso latent-variables along the diagonal
of the covariance matrix in the prior for beta. The default setting
(when NULL is given) is contextual. See below for details |
rd |
=c(r, delta) , the alpha (shape) parameter and beta
(rate) parameter to the gamma distribution prior for
the lasso parameter lambda |
ab |
=c(a, b) , the alpha (shape) parameter and the beta
(scale) parameter for the inverse-gamma distribution
prior for the variance parameter s2 . A default of NULL
generates appropriate non-informative values depending on the
nature of the regression |
rao.s2 |
indicates whether to use Rao-Blackwellized samples for
s^2 should be used (default TRUE ), see
the details section, below |
normalize |
if TRUE , each variable is standardized to have unit
L2 norm, otherwise it is left alone. Default is TRUE |
verb |
verbosity level; currently only verb = 0 and
verb = 1 are supported |
The Bayesian lasso model and Gibbs Sampling algorithm is described
in detail in Park & Casella (2008). The algorithm implemented
by this function is identical to that described therein., with
the exception of the “option” to use a Rao-Blackwellized sample
of s^2 (with beta integrated out)
for improved mixing, and the model selections by RJ described below.
When input argument lambda2 = 0
is
supplied, the model is a simple hierarchical linear model where
the prior for beta has mean zero and diagonal
covariance matrix with diagonal 1/tau2i
Specifying RJ = TRUE
causes Bayesian model selection and
averaging to commence for choosing which of the columns of the
design matrix X
(and thus parameters beta
) are
included in the model. The zero-components of the initial beta
vector specify which of the columns are in the initial model, and
M
specifies the maximal number of columns.
The RJ mechanism implemented here is distinct from the model selection method described by Hans (2008), which is based on Geweke (1996). Those methods require departing from the Park & Casella (2008) to sample sampling from each conditional beta[i] | beta[-i], ... for all i, and that a mixture prior with a point-mass at zero be placed on each beta[i]. The method implemented here requires no such special prior and retains the joint sampling from the full beta vector of non-zero entries, which we believe yields better mixing in the Markov chain. RJ proposals to increase/decrease the number of non-zero entries does proceed component-wise, but the acceptance rates are high due to an optimal second–order proposal (Ehlers & Brooks, 2005)
blasso
returns an object of class "blasso"
, which is a
list containing a copy of all of the input arguments as well as
of the components listed below.
call |
a copy of the function call as used |
mu |
a vector of T samples from the (un-penalized)
“intercept” parameter |
beta |
a T*ncol(X) matrix of T samples from
the (penalized) regression coefficients |
m |
the number of non-zero entries in each vector of T
samples of beta |
s2 |
a vector of T samples from the variance parameter |
lambda2 |
a vector of T samples from the penalty
parameter |
tau2i |
a T*ncol(X) matrix of T samples from
the (latent) inverse diagonal of the prior covariance matrix for
beta |
Whenever ncol(X) >= nrow(X)
it must be that either RJ = TRUE
with M <= nrow(X)-1
(the default) or that the lasso is turned
on with lambda2 > 0
or the regression problem is ill-posed.
When lambda2 = 0
is given the initial tau2i
vector is
taken to be zero to indicate that it has been removed from the model,
implying a flat, improper, prior for beta
.
Other settings will be taken, but are ignored in the current version.
Future versions will allow the specification of informative
(non lasso) priors.
Since the starting values are considered to be first sample (of
T
), the total number of (new) samples obtained by Gibbs
Sampling will be T-1
Robert B. Gramacy bobby@statslab.cam.ac.uk
Park, T., Casella, G. (2008).
The Bayesian Lasso, (unpublished)
http://www.stat.ufl.edu/~casella/Papers/bayeslasso.pdf
Chris Hans. (2008). Bayesian Lasso regression.
Technical Report No. 810, Department of Statistics,
The Ohio State University, Columbus, OH 43210.
http://www.stat.osu.edu/~hans/Papers/blasso.pdf
Geweke, J. (1996). Variable selection and model comparison in regression. In Bayesian Statistics 5. Editors: J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith, 609-620. Oxford Press.
Ehlers, R.S. and Brooks, S.P. (2005).
Efficient Construction of Reversible Jump MCMC Proposals for
Autoregressive Time Series Models. (unpublished)
http://www.statslab.cam.ac.uk/~steve/mypapers/ehlb02.ps
http://www.statslab.cam.ac.uk/~bobby/monomvn.html
lm
,
lars
in the lars package,
regress
## following the lars diabetes example data(diabetes) attach(diabetes) ## Ordinary Least Squares regression reg.ols <- regress(x, y) ## Lasso regression reg.las <- regress(x, y, method="lasso") ## Bayesian Lasso regression reg.blas <- blasso(x, y, T=1000) ## summarize the beta (regression coefficients) estimates plot(reg.blas, burnin=200) points(drop(reg.las$b), col=2, pch=20) points(drop(reg.ols$b), col=3, pch=18) legend("topleft", c("lasso", "lsr"), col=2:3, pch=c(20,18)) ## plot the size of different models visited plot(reg.blas, burnin=200, which="m") ## (TODO: allow Bayes method with fixed lambda) ## get the summary s <- summary(reg.blas) ## calculate the probability that each beta coef != zero s$bn0 ## summarize s2 plot(reg.blas, burnin=200, which="s2") s$s2 ## summarize lambda2 plot(reg.blas, burnin=200, which="lambda2") s$lambda2 ## clean up detach(diabetes) ## ## a big-p small-n example ## xmuS <- randmvn(50, 101) X <- xmuS$x[,1:100] y <- drop(xmuS$x[,101]) out.b <- blasso(X, y, T=1000, thin=20, verb=0) ## plot summary of the model order plot(out.b, burnin=100, which="m") ## fit a standard lasso model out.las <- regress(X, y, method="lasso") ## compare via RMSE beta <- xmuS$S[101,-101] sqrt(mean((apply(out.b$beta, 2, mean) - beta)^2)) sqrt(mean((out.las$b[-1] - beta)^2))