lts.birch {birch}R Documentation

Finding the Least Trimmed Squares (LTS) regression estimate using BIRCH

Description

This algorithm searches for the Least Trimmed Squares (LTS) solution using a BIRCH object.

Usage

lts.birch(birchObject, alpha=0.5, intercept = FALSE, nsamp=100)
ltsBirch.refinement(ltsOut, x, y, alpha=0.5, intercept = FALSE)

Arguments

birchObject an object created by the function birch. See details for information on how to specify the exploratory and response.
alpha numeric parameter controlling the size of the subsets over which the trimmed residuals are minimized, i.e., alpha*n observations are used when computing the trimmed residual sum of squares . Allowed values are between 0.5 and 1 and the default is 0.5.
intercept a Boolean - is there a intercept?
nsamp number of subsets used for initial estimates
ltsOut the output from lts.birch.
x, y a data set of explanatory and response variables on which to perform a set of concentration steps.

Details

The algorithm is very similar to the ltsRef function from the robustbase package from Rousseeuw and Van Driessen (2006), except it uses a BIRCH object instead. A complete description is given in Harrington and Salibian-Barrera (2007) and Harrington and Salibian-Barrera (2008)

The algorithm assumes that the last column of the birch object contains the response variable, and that all the other columns are explanatories. While it is possible to select columns using the usual

[,j]
, it is recommended that the birch object be rebuilt from the underlying data set with just the explanatories and response variables selected.

If an intercept is required in the model, either the intercept argument can be set to true, or a column of ‘ones’ should be column-wise appended to the data (prior to building the birch object).

A summary method is available for the output of this command.

Value

Returns a list containing:

best A list containing a vector of which subclusters make up the clustering (sub) and a vector with the underlying observations that make up the clusters (obs).
raw.coefficients the fitted LTS regression line.
Resids A list containing the sum of squared residuals for the best subset, as well as the sum of squared residuals for the whole data set (based on the LTS regression equation).

Note

In order for this algorithm to produce meaningful results, the number of subclusters in the birch object should number in the hundreds, and even better, thousands.

Author(s)

Justin Harrington harringt@stat.ubc.ca and Matias Salibian-Barrera matias@stat.ubc.ca

References

Harrington, J and Salibian-Barrera, M (2007) “Finding Approximate Solutions to Combinatorial Problems with Very Large Datasets using BIRCH”, submitted to Statistical Algorithms and Software, 2nd Special Issue Computational Statistics and Data Analysis. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch.pdf.

Harrington, J and Salibian-Barrera, M (2008) “birch: Working with very large data sets”, submitted to Journal of Statistical Software. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch-jss.pdf.

Rousseeuw, P.J. and Van Driessen, K. (2006) “Computing LTS Regression for Large Data Sets”, Data Mining and Knowledge Discovery 12, 29–45.

See Also

birch, and the original algorithm ltsReg

Examples

data(birchObj)
ltsOut <- lts.birch(birchObj, 0.5)
ltsOut2 <- lts.birch(birchObj, 0.5, intercept=TRUE)
summary(ltsOut2)

## If the original data set was available
## Not run: refOut <- ltsBirch.refinement(ltsOut2, x, y, 0.5, intercept=TRUE)

[Package birch version 1.1-3 Index]