blasso {monomvn}R Documentation

Bayesian Lasso Regression

Description

Ordinary least squares and Lasso regression by sampling from the Bayesian posterior distribution via Gibbs Sampling and Reversible Jump for model selection

Usage

blasso(X, y, T = 100, thin = 10, RJ = TRUE, M = NULL,
       beta = NULL, lambda2 = 1, s2 = 1, tau2i = NULL,
       rd = c(2,0.1), ab = NULL, rao.s2 = TRUE,
       normalize = TRUE, verb = 1)

Arguments

X data.frame, matrix, or vector of inputs X
y vector of output responses y of length equal to the leading dimension (rows) of X, i.e., length(y) == nrow(X)
T Total number of MCMC samples to be collected
thin number of MCMC samples to skip before a sample is collected (via thinning)
RJ if TRUE then model selection on the columns of the design matrix (and thus the parameter beta in the model) is performed by reversible jump (RJ) MCMC. The initial model is specified by the beta input, described below, and the maximal number of covariates in the model is specified by M
M the maximal number of allowed covariates (columns of X) in the model. If input lambda2 > 0 then M as large as ncol(X) is allowed. Otherwise M must be <= min(ncol(X), length(y)-1), its default value when argument NULL is given
beta Initial setting of the regression coefficients. Any zero-components will imply the corresponding covariate (column of X) is not in the initial model. When input RJ = FALSE (no RJ) and lambda2 > 0 (use lasso) then no components are allowed to be exactly zero. The default setting is therefore contextual. See below for details
lambda2 square of the initial lasso penalty parameter. If zero, the least squares regressions are used
s2 initial variance parameter
tau2i initial vector of lasso latent-variables along the diagonal of the covariance matrix in the prior for beta. The default setting (when NULL is given) is contextual. See below for details
rd =c(r, delta), the alpha (shape) parameter and beta (rate) parameter to the gamma distribution prior for the lasso parameter lambda
ab =c(a, b), the alpha (shape) parameter and the beta (scale) parameter for the inverse-gamma distribution prior for the variance parameter s2. A default of NULL generates appropriate non-informative values depending on the nature of the regression
rao.s2 indicates whether to use Rao-Blackwellized samples for s^2 should be used (default TRUE), see the details section, below
normalize if TRUE, each variable is standardized to have unit L2 norm, otherwise it is left alone. Default is TRUE
verb verbosity level; currently only verb = 0 and verb = 1 are supported

Details

The Bayesian lasso model and Gibbs Sampling algorithm is described in detail in Park & Casella (2008). The algorithm implemented by this function is identical to that described therein., with the exception of the “option” to use a Rao-Blackwellized sample of s^2 (with beta integrated out) for improved mixing, and the model selections by RJ described below. When input argument lambda2 = 0 is supplied, the model is a simple hierarchical linear model where the prior for beta has mean zero and diagonal covariance matrix with diagonal 1/tau2i

Specifying RJ = TRUE causes Bayesian model selection and averaging to commence for choosing which of the columns of the design matrix X (and thus parameters beta) are included in the model. The zero-components of the initial beta vector specify which of the columns are in the initial model, and M specifies the maximal number of columns.

The RJ mechanism implemented here is distinct from the model selection method described by Hans (2008), which is based on Geweke (1996). Those methods require departing from the Park & Casella (2008) to sample sampling from each conditional beta[i] | beta[-i], ... for all i, and that a mixture prior with a point-mass at zero be placed on each beta[i]. The method implemented here requires no such special prior and retains the joint sampling from the full beta vector of non-zero entries, which we believe yields better mixing in the Markov chain. RJ proposals to increase/decrease the number of non-zero entries does proceed component-wise, but the acceptance rates are high due to an optimal second–order proposal (Ehlers & Brooks, 2005)

Value

blasso returns an object of class "blasso", which is a list containing a copy of all of the input arguments as well as of the components listed below.

call a copy of the function call as used
mu a vector of T samples from the (un-penalized) “intercept” parameter
beta a T*ncol(X) matrix of T samples from the (penalized) regression coefficients
m the number of non-zero entries in each vector of T samples of beta
s2 a vector of T samples from the variance parameter
lambda2 a vector of T samples from the penalty parameter
tau2i a T*ncol(X) matrix of T samples from the (latent) inverse diagonal of the prior covariance matrix for beta

Note

Whenever ncol(X) >= nrow(X) it must be that either RJ = TRUE with M <= nrow(X)-1 (the default) or that the lasso is turned on with lambda2 > 0 or the regression problem is ill-posed.

When lambda2 = 0 is given the initial tau2i vector is taken to be zero to indicate that it has been removed from the model, implying a flat, improper, prior for beta. Other settings will be taken, but are ignored in the current version. Future versions will allow the specification of informative (non lasso) priors.

Since the starting values are considered to be first sample (of T), the total number of (new) samples obtained by Gibbs Sampling will be T-1

Author(s)

Robert B. Gramacy bobby@statslab.cam.ac.uk

References

Park, T., Casella, G. (2008). The Bayesian Lasso, (unpublished)
http://www.stat.ufl.edu/~casella/Papers/bayeslasso.pdf

Chris Hans. (2008). Bayesian Lasso regression. Technical Report No. 810, Department of Statistics, The Ohio State University, Columbus, OH 43210.
http://www.stat.osu.edu/~hans/Papers/blasso.pdf

Geweke, J. (1996). Variable selection and model comparison in regression. In Bayesian Statistics 5. Editors: J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith, 609-620. Oxford Press.

Ehlers, R.S. and Brooks, S.P. (2005). Efficient Construction of Reversible Jump MCMC Proposals for Autoregressive Time Series Models. (unpublished)
http://www.statslab.cam.ac.uk/~steve/mypapers/ehlb02.ps

http://www.statslab.cam.ac.uk/~bobby/monomvn.html

See Also

lm , lars in the lars package, regress

Examples

## following the lars diabetes example
data(diabetes)
attach(diabetes)

## Ordinary Least Squares regression
reg.ols <- regress(x, y)

## Lasso regression
reg.las <- regress(x, y, method="lasso")

## Bayesian Lasso regression
reg.blas <- blasso(x, y, T=1000)

## summarize the beta (regression coefficients) estimates
plot(reg.blas, burnin=200)
points(drop(reg.las$b), col=2, pch=20)
points(drop(reg.ols$b), col=3, pch=18)
legend("topleft", c("lasso", "lsr"), col=2:3, pch=c(20,18))

## plot the size of different models visited
plot(reg.blas, burnin=200, which="m")

## (TODO: allow Bayes method with fixed lambda)

## get the summary
s <- summary(reg.blas)

## calculate the probability that each beta coef != zero
s$bn0

## summarize s2
plot(reg.blas, burnin=200, which="s2")
s$s2

## summarize lambda2
plot(reg.blas, burnin=200, which="lambda2")
s$lambda2

## clean up
detach(diabetes)

##
## a big-p small-n example
##

xmuS <- randmvn(50, 101)
X <- xmuS$x[,1:100]
y <- drop(xmuS$x[,101])
out.b <- blasso(X, y, T=1000, thin=20, verb=0)

## plot summary of the model order
plot(out.b, burnin=100, which="m")

## fit a standard lasso model
out.las <- regress(X, y, method="lasso")

## compare via RMSE
beta <- xmuS$S[101,-101] 
sqrt(mean((apply(out.b$beta, 2, mean) - beta)^2))
sqrt(mean((out.las$b[-1] - beta)^2))


[Package monomvn version 1.3-1 Index]