bark-package {bark}R Documentation

Bayesian Additive Regression Kernels

Description

Implementation of BARK: Bayesian Additive Regression Kernels with Feature Selection, Zhi Ouyang, Ph.D. thesis, Duke University

Details

Package: bark
Type: Package
Version: 0.1-0
Date: 2008-07-16
License: GPL version 2 or newer
LazyLoad: yes

Overview:
BARK is a Bayesian sum-of-kernels model.
For numeric response y, we have y = f(x) + e, where e ~ N(0,sigma^2).
For a binary response y, P(Y=1 | x) = F(f(x)), where F denotes the standard normal cdf (probit link).

In both cases, f is the sum of many Gaussian kernel functions. The goal is to have very flexible inference for the unknown function f. It uses an approximated Cauchy process as the prior distribution for the unknown function f.

Feature selection can be achieved through the inference on the scale parameters in the Gaussian kernels. BARK accepts four different types of prior distributions, e, d, se, sd, enabling either soft shrinkage or hard shrinkage for the scale parameters.

Functions:
bark()
sim.Friedman1()
sim.Friedman2()
sim.Friedman3()
sim.Circle()

Author(s)

Zhi Ouyang <zo2@stat.duke.edu>, Merlise Clyde <clyde@stat.duke.edu>, Robert Wolpert <wolpert@stat.duke.edu>

Maintainer: Zhi Ouyang <zo2@stat.duke.edu>

References

Ouyang, Zhi (2008) Bayesian Additive Regression Kernels. Duke University. Ph.D. dissertation, Chapter 3.
at: http://stat.duke.edu/people/theses/OuyangZ.html

Examples

##Simulate regression example
# Friedman 2 data set, 200 noisy training, 1000 noise free testing
# Out of sample MSE in SVM (default RBF): 6500 (sd. 1600)
# Out of sample MSE in BART (default):    5300 (sd. 1000)
traindata <- sim.Friedman2(200, sd=125)
testdata <- sim.Friedman2(1000, sd=0)
fit.bark.d <- bark(traindata$x, traindata$y, testdata$x, classification=FALSE, type="d")
boxplot(as.data.frame(fit.bark.d$theta.lambda))
mean((fit.bark.d$yhat.test.mean-testdata$y)^2)

##Simulate classification example
# Circle 5 with 2 signals and three noisy dimensions
# Out of sample erorr rate in SVM (default RBF): 0.110 (sd. 0.02)
# Out of sample error rate in BART (default):    0.065 (sd. 0.02)
traindata <- sim.Circle(200, dim=5)
testdata <- sim.Circle(1000, dim=5)
fit.bark.se <- bark(traindata$x, traindata$y, testdata$x, classification=TRUE, type="se")
boxplot(as.data.frame(fit.bark.se$theta.lambda))
mean((fit.bark.se$yhat.test.mean>0)!=testdata$y)

[Package bark version 0.1-0 Index]