mvBACON {robustX} | R Documentation |
This function performs an outlier identification algorithm to the data in the x array [n x p] and y vector [n] following the lines described by Hadi et al. for their BACON outlier procedure.
mvBACON(x, collect = 4, m = min(collect * p, n * 0.5), alpha = 0.95, init.sel = c("Mahalanobis", "dUniMedian", "random", "manual"), man.sel, maxsteps = 100, allowSingular = FALSE, verbose = TRUE)
x |
numeric matrix (of dimension [n x p]), not supposed to contain missing values. |
collect |
a multiplication factor, when init.sel is not
"manual" , to define m, the size of the initial basic
subset, as m <- min(p * collect, n/2) . |
m |
integer in 1:n specifying the size of the initial basic
subset; used only when init.sel is not "manual" . |
alpha |
significance level for the chisq cutoff, used to define the next iterations basic subset. |
init.sel |
character string, specifying the initial selection
mode; implemented modes are:
"Mahalanobis" , "dUniMedian" where proposed by Hadi
and the other authors in the reference as versions ‘V_1’
and ‘V_2’, as well as "manual" ,
while "random" is provided in order to study the behaviour of
BACON.
|
man.sel |
only when init.sel == "manual" , the indices of
observations determining the initial basic subset (and m <-
length(man.sel) ). |
maxsteps |
maximal number of iteration steps. |
allowSingular |
logical indicating a solution should be sought also when no matrix of rank p is found. |
verbose |
logical indicating if messages are printed which trace progress of the algorithm. |
a list with components
subset |
logical vector of length n where the i -th
entry is true iff the i-th observation is part of the final selection. |
dis |
numeric vector of length n with the (Mahalanobis)
distances. |
cov |
p x p matrix, the corresponding robust estimate of covariance. |
Ueli Oetliker, Swiss Federal Statistical Office, for S-plus 5.1. Port to R, testing etc, by Martin Maechler
Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279–298.
covMcd
for a high-breakdown (but more computer
intensive) method;
BACON
for a “generalization”, notably to
regression.
## simple 2D example : plot(starsCYG, main = "starsCYG data (n=47)") B.st <- mvBACON(starsCYG) points(starsCYG[ ! B.st$subset,], pch = 4, col = 2, cex = 1.5) ## finds the clear outliers (and 3 "borderline") ## 'coleman' from pkg 'robustbase' coleman.x <- data.matrix(coleman[, 1:6]) Cc <- covMcd (coleman.x) # truely robust Cb1 <- mvBACON(coleman.x) ##-> subset is all TRUE hmm?? Cb2 <- mvBACON(coleman.x, init.sel = "dUniMedian") ## --> BACON "breaks down" here