lts.birch {birch} | R Documentation |
This algorithm searches for the Least Trimmed Squares (LTS) solution using a BIRCH object.
lts.birch(birchObject, alpha=0.5, intercept = FALSE, nsamp=100) ltsBirch.refinement(ltsOut, x, y, alpha=0.5, intercept = FALSE)
birchObject |
an object created by the function
birch . See details for information on how to specify the
exploratory and response. |
alpha |
numeric parameter controlling the size of the subsets over which the trimmed residuals are minimized, i.e., alpha*n observations are used when computing the trimmed residual sum of squares . Allowed values are between 0.5 and 1 and the default is 0.5. |
intercept |
a Boolean - is there a intercept? |
nsamp |
number of subsets used for initial estimates |
ltsOut |
the output from lts.birch . |
x, y |
a data set of explanatory and response variables on which to perform a set of concentration steps. |
The algorithm is very similar to the ltsRef
function from the robustbase
package from Rousseeuw and Van Driessen (2006), except it uses
a BIRCH object instead. A complete description is given in Harrington and
Salibian-Barrera (2007) and Harrington and Salibian-Barrera (2008)
The algorithm assumes that the last column of the birch object contains the response variable, and that all the other columns are explanatories. While it is possible to select columns using the usual
[,j], it is recommended that the birch object be rebuilt from the underlying data set with just the explanatories and response variables selected.
If an intercept is required in the model, either the intercept
argument can be set to true, or a column of ‘ones’
should be column-wise appended to the data (prior to building the
birch object).
A summary method is available for the output of this command.
Returns a list containing:
best |
A list containing a vector of which subclusters make up the clustering (sub) and a vector with the underlying observations that make up the clusters (obs). |
raw.coefficients |
the fitted LTS regression line. |
Resids |
A list containing the sum of squared residuals for the best subset, as well as the sum of squared residuals for the whole data set (based on the LTS regression equation). |
In order for this algorithm to produce meaningful results, the number of subclusters in the birch object should number in the hundreds, and even better, thousands.
Justin Harrington harringt@stat.ubc.ca and Matias Salibian-Barrera matias@stat.ubc.ca
Harrington, J and Salibian-Barrera, M (2007) “Finding Approximate Solutions to Combinatorial Problems with Very Large Datasets using BIRCH”, submitted to Statistical Algorithms and Software, 2nd Special Issue Computational Statistics and Data Analysis. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch.pdf.
Harrington, J and Salibian-Barrera, M (2008) “birch: Working with very large data sets”, submitted to Journal of Statistical Software. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch-jss.pdf.
Rousseeuw, P.J. and Van Driessen, K. (2006) “Computing LTS Regression for Large Data Sets”, Data Mining and Knowledge Discovery 12, 29–45.
birch
, and the original algorithm ltsReg
data(birchObj) ltsOut <- lts.birch(birchObj, 0.5) ltsOut2 <- lts.birch(birchObj, 0.5, intercept=TRUE) summary(ltsOut2) ## If the original data set was available ## Not run: refOut <- ltsBirch.refinement(ltsOut2, x, y, 0.5, intercept=TRUE)