BACON {robustX} | R Documentation |
BACON, short for ‘Blocked Adaptive Computationally-Efficient Outlier Nominators’, is a somewhat robust algorithm (set), with an implementation for regression or multivariate covariance estimation.
BACON()
applies the multivariate (covariance estimation)
algorithm, using mvBACON(x)
in any case, and when
y
is not NULL
adds a regression iteration phase,
using the auxiliary .lmBACON()
function.
BACON(x, y = NULL, intercept = TRUE, m = min(collect * p, n * 0.5), init.sel = c("Mahalanobis", "dUniMedian", "random", "manual"), man.sel, init.fraction = 0, collect = 4, alpha = 0.95, maxsteps = 100, verbose = TRUE) ## *Auxiliary* function: .lmBACON(x, y, intercept = TRUE, init.dis, init.fraction = 0, collect = 4, alpha = 0.95, maxsteps = 100, verbose = TRUE)
x |
a multivariate matrix of dimension [n x p] considered as containing no missing values. |
y |
the response (n vector) in the case of regression, or
NULL for the multivariate case. |
intercept |
logical indicating if an intercept has to be used for the regression. |
m |
integer in 1:n specifying the size of the initial basic
subset; used only when init.sel is not "manual" ; see
mvBACON . |
init.sel |
character string, specifying the initial selection
mode; see mvBACON . |
man.sel |
only when init.sel == "manual" , the indices of
observations determining the initial basic subset (and m <-
length(man.sel) ). |
init.dis |
the distances of the x matrix used for the initial
subset determined by mvBACON . |
init.fraction |
if this parameter is > 0 then the tedious steps of selecting the initial subset are skipped and an initial subset of size n * init.fraction is chosen (with smallest dis) |
collect |
numeric factor chosen by the user to define the size of the initial subset (p * collect) |
alpha |
significance level. |
maxsteps |
the maximal number of iteration steps (to prevent infinite loops) |
verbose |
logical indicating if messages are printed which trace progress of the algorithm. |
init.sel: the initial selection mode; implemented modes are: "Mah" -> based on Mahalanobis distance (default) "dis" -> based on the distances from the medians "ran" -> based on a random selection "man" -> based on manual selection in this case the vector 'man.sel' which contains the indices of the selected observations must be given. "Mah" and "dis" are proposed by Hadi while "ran" and "man" were implemented in order to study the behaviour of BACON.
basically a list
with components
subset |
the observation indices (in 1:n ) denoting the
subset of ``good'' observations. |
tis |
............ |
Ueli Oetliker, Swiss Federal Statistical Office, for S-plus 5.1; 25.05.2001; modified six times till 17.6.2001.
Port to R, testing etc, by Martin Maechler.
Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279–298.
mvBACON
, the multivariate version of the BACON
algorithm.
data(starsCYG, package = "robustbase") ## Plot simple data and fitted lines plot(starsCYG) lmST <- lm(log.light ~ log.Te, data = starsCYG) (B.ST <- with(starsCYG, BACON(x = log.Te, y = log.light))) (RlmST <- lmrob(log.light ~ log.Te, data = starsCYG)) abline(lmST, col = "red") abline(RlmST, col = "blue")