gbev {gbev}R Documentation

Boosted regression trees with errors-in-variables

Description

Fits boosted regression trees with errors-in-variables.

Usage


gbev(formula = formula(data),
     data = list(),
     weights=NULL,
     measErrorModel=NULL,
     method="L2", 
     indepFitUpdate=1,
     nboost=100,
     lambda=100,
     maxDepth=2,
     minSplit=10, 
     minBucket=0,
     sPoints=10,
     mc=2,
     intermPred=10,
     maxSplitAttempts=10)

Arguments

formula A symbolic description of the model to be fit. NOTE: the error measured covariates must be placed first in the formula.
data A data frame containing variables in the model.
weights Weights applied to observations. Defaults to 1.
measErrorModel This is a list specifying the distribution of the latent covariates and the measurement error. Here it is assumed that the latent covariates are a mixture of normals (possibly multivariate), and that the measurement error is normally distributed. See examples below for details. Note that at least one covariate must be specified as measured with error.
method Can be L2 for squared error loss, logLike for binary regression with negative log-likelihood loss.
indepFitUpdate If indepFitUpdate=1 then model fit is updated using independent MC-sample, else the fit is updated using the same MC-sample as used in tree fitting.
nboost Number of boosting iterations to perform.
lambda Regularization parameter.
maxDepth Determines maximum interaction depth of trees fit. maxDepth=2 fits stumps.
minSplit Minimum expected number of observations in a node for it to be split.
minBucket Minimum expected number of observations in a node.
sPoints Number of points sampled at random from which to choose split.
mc Number of Monte-Carlo samples drawn to compute node probabilities.
intermPred Increments of iterations at which intermediate predictions are saved, required for cvLoss function.
maxSplitAttempts Maximum number of attempts at finding a valid split point. When splitting a node, sPoints candidate splits are supposed to be found for each covariate, however, each randomly sampled split point does not necessarily give a valid split point (i.e. a point satisfying minBucket and minSplit), and maxSplitAttempts is the maximum number of attempts at finding such a point. .

Details

This function performs non-parametric regression when some or all covariates are measured with error. It is assumed that the latent (error measured) covariate vector, X, is observed via a random variable W, the relation between the two being

W=X+U

where U is the measurement error. This function assumes that the density of X is known and representable as a finite mixture of multivariate normal densities, while the measurement error U is assumed to be multivariate normal. These two densities are specified in the measErrorModel argument of the gbev function, see examples below for further details on measErrorModel. In practice the densities of X and U will not be known, but must be estimated in some manner. This is typically done using replicate measurements or validation data, see Carroll et.al. (1995) for further details.

It is assumed that in the formula argument the error measured covariates occur first. That is, for example, if one has covariates W1, W2, Z1 and Z2, where the first two are error contaminated and the last two error-free then formula must be y~w1+w2+z1+z2.

The function gbev estimates E(Y|X) where Y is the response, using boosted regression trees. That is, a model F(x) for E(Y|x) is estimated such that F(x)=sum_{k=1}^{K}T_k(x) where each T_k(x) is a regression tree, and each regression tree is estimated by fitting it to the gradient of the previous model estimate using least squares, as described in Friedman (2001).

The model F(x) is built up iteratively, such that at each boosting iteration a tree model is added to F(x) to minimize the squared error between the observed data model, given by E(F(X)|w), and the current model residuals. More specifically, on the k-th iteration, the observed data residuals are tilde{y_i}=(y_i-E(F_k(X)|w)) for i=1,..,n, and a tree T_{k}(x)=sum_{j=1}^{J}c_{kj}I(xin R_{kj}) is fit by fitting E(T_{k}(X)|w), using least squares and recursive partitioning, to the current residuals tilde{y_i}. The model estimate is then updated by setting F_{k+1}(x)=F_{k}(x)+T_{k}(x), and the procedure iterated.

The tree fitting requires evaluating probabilities of the form Pr(Xin R|w)=E(I(Xin R)|w), i.e. the probability that the covariate X is in a rectangular region R, given that W=w. These probabilities are here estimated using Monte Carlo sampling, and the argument mc in gbev regulates how many samples are drawn, at each iteration, for each observation, the default being mc=2. Boosting is known to perform best when the amount of fitting at each iteration is small, here this is regulated by the size of the regression trees, the maxDepth argument, and the regularization parameter lambda. The larger values of lambda the less fitting is done per boosting iteration, and the more iterations are typically required to achieve adequate fitting. The lambda parameter is also used to control the Monte Carlo error in the function estimate, due to the sampling in the tree fitting. It turns out that the larger value of lambda the smaller Monte Carlo error is in the regression function estimate, for the same value of mc. Some experimentation is required to find an appropriate lambda value such that the Monte Carlo error is acceptable.

Two loss functions are available for estimation, specified in the method argument. Setting method="L2" squared error loss is used, while method="logLike" negative log-likelihood loss is used for performing binary, Y is 0 or 1, regression.

The arguments maxDepth, minSplit, minBucket, and sPoints control the tree fitting. maxDepth controls the depth of the regression tree to be fit, with maxDepth=2 fitting a tree containing a single split (two terminal nodes), maxDepth=3 fits a tree with 4 terminal nodes, obtained by splitting the nodes of a single split tree. Interestingly, maxDepth=2 fits an additive model, maxDepth=3 a model with all second order interactions, maxDepth=4 a model with all third order interactions, and so on. The argument sPoints is the number of candidate split points sampled for each covariate when splitting a node. In the absense of measurement error node splitting is typically done by examining all values of the observed covariates falling in the node being split, however with measurement error the realized values of X are not observed, and sampling is here used to generate candidate split points, with sPoints governing the number of such split points sampled for each covariate. Interestingly, smaller values of sPoints often work the best. The argument minSplit determines the smallest (expected) number of observations in a node for a split to be attempted, while the argument minBucket the smallest (expected) number of observations in a terminal node. Note that using non-zero values of lambda, minBucket=0 will run without error.

Value

gbev returns gbev.object.

Author(s)

Joe Sexton j.a.sexton@medisin.uio.no

References

J.H. Friedman (2001). "Greedy Function Approximation: A Gradient Boosting Machine," Annals of Statistics 29(5):1189-1232.

T. Hastie, R. Tibshirani and J.H. Friedman (2001). "The Elements of Statistical Learning" Springer.

R. Carroll, D. Ruppert and L. Stenfanski (1995). "Measurement Error in Nonlinear Models," Chapman and Hall.

Examples


### Univariate regression example
n<-500
varX<-1
varME<-0.25
varNoise<-0.3^2

### Data 
x<-rnorm(n,sd=sqrt(varX))                              ### Error free covariate
w<-x+rnorm(n,sd=sqrt(varME))                           ### Error contaminated version
fx<-sin(pi*x/2)/(1+2*(x^2)*((2*as.numeric(x>=0)-1)+1)) ### True regression function  
y<-fx+rnorm(n,sd=sqrt(varNoise))                       ### Response                           
dat<-data.frame(y=y,w=w)

### Measurement error model ####
###  
### The measurement error model is a list of the following components:
###
### SigmaX:    the covariance matrices of the mixture model for the error free covariates 
###            SigmaX[i,,] is the covariance matrix of the i-th mixture density
### mu:        the means of the mixture model for the error free covariates 
###            mu[i,] is the mean-vector of the i-th mixture density
### SigmaME:   the covariance matrix of the measurment error
### pComp:     the weights of the mixture distribution, pComp[i] is the weight of the 
###            i-th mixture density
### numComp:   the number of components in the mixture 
##
p<-1
pME<-1

numComp<-3                                    ## number of components in gaussian mixture for X-distribution
SigmaME<-diag(varME,pME)
SigmaJ<-array(dim=c(numComp,pME,pME))         
mu<-array(dim=c(numComp,pME))
pComp<-array(1/numComp,dim=c(numComp,1))
for(i in 1:numComp)
{
SigmaJ[i,,]<-diag(varX,pME)
mu[i,]<-rep(0,pME)
}
### list required by "gbev" for measurement error model
meModel<-list(SigmaX=SigmaJ,mu=mu,SigmaME=SigmaME,pComp=pComp,numComp=numComp)

fit<-gbev(y~w,data=dat,
          measErrorModel=meModel,     
          method="L2",              ## Squared error loss
          nboost=1000,              ## 1000 boosting iterations
          lambda=5,                 ## regularization of regression tree
          maxDepth=2,               ## maximum tree depth, 2 corresponds stumps
          mc=2,                     ## number of monte-carlo samples per tree build 
          minSplit=3,               ## minimum number of obs in node to split
          minBucket=0,              ## minimum number of obs in nodes
          sPoints=10,               ## number of sampled candidate split points
          intermPred=5)             ## increments of iterations to store predictions 

### 5-fold cross-validation
hcv<-cvLoss(object=fit,k=5,random=FALSE,loss="L2")
plot(hcv$iters,hcv$cvLoss,type="l")

hp<-part.dep(object=fit,varIndx=1,firstTree=1,lastTree=hcv$estIter)

x<-seq(-2,2,by=.02)
fx<-sin(pi*x/2)/(1+2*(x^2)*((2*as.numeric(x>=0)-1)+1)) 
points(x,fx,type="l",lty=5)



## Simulated binary regression example, 
## with: Y=I( X1*X2+X2*X3+X1*X3>0), with measurement error on X's
n<-1000
p<-3
varX<-1     ## 
varME<-0.5  ## measurement error variance

x<-rnorm(p*n)
x<-matrix(x,ncol=p,nrow=n)
## add measurement error
w<-x+matrix(rnorm(p*n,sd=sqrt(varME)),ncol=p,nrow=n)   

x<-x[,c(1:p)]*x[,c(2:p,1)]
x<-apply(x,1,sum)
threshold<-0
y<-as.numeric(x>threshold)
dat<-data.frame(y=y,w1=w[,1],w2=w[,2],w3=w[,3])  ##  must be modified if(p!=3)

#### Measurement error model ######
numComp<-1                               ##  Number of components in mixture 
SigmaME<-diag(varME,p)                   ##  Covariance matrix of measurement error
SigmaJ<-array(dim=c(numComp,p,p))        ##  Covariance matices for mixture
mu<-array(dim=c(numComp,p))              ##  Mean vectors for mixture components
pComp<-array(1/numComp,dim=c(numComp,1)) ##  Mixture probabilities
for(i in 1:numComp)
{                                        ## filling in mixture model for X-distribution
SigmaJ[i,,]<-diag(varX,p)
mu[i,]<-rep(0,p)
}
## The list for measurement error model 
meModel<-list(SigmaX=SigmaJ,mu=mu,SigmaME=SigmaME,pComp=pComp,numComp=numComp)

fit<-gbev(y~w1+w2+w3,data=dat,
         measErrorModel=meModel,   
         method="logLike",       ## loss function
         nboost=1000,            ## number of boosting iterations
         lambda=40,              ## regularization parameter used in regression tree
         maxDepth=3,             ## maximum depth of regression tree 
         minSplit=10,             ## minimum number of observations in  node to  split
         minBucket=0,            ## minimum number in split node to allow split
         sPoints=2,             ## number of sampled canditate split points 
         mc=2,                   ## monte-carlo sample size used in each regression tree
         intermPred=10)          ## Increments of iterations to store loss function
        

## plot loss function as function of iterations
hp<-plotLoss(fit,loss="logLike",startIter=10)

## bivariate partial dependence plot
hdp<-part.dep(object=fit,varIndx=c(1,2),firstTree=1,
lastTree=1000,ngrid=50)
dpp<-data.frame(x1=hdp$dat$x,x2=hdp$dat$y,prob=hdp$dat$z)
library(lattice)
wireframe(prob~x1*x2,dpp,aspect=c(1,0.5),drape=TRUE,screen=list(z=50,x=-60),
scales=list(arrows=FALSE),xlim=c(-2.5,2.5),ylim=c(-2.5,2.5))


[Package gbev version 0.1.1 Index]