mvBACON {robustX}R Documentation

BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators

Description

This function performs an outlier identification algorithm to the data in the x array [n x p] and y vector [n] following the lines described by Hadi et al. for their BACON outlier procedure.

Usage

mvBACON(x, collect = 4, m = min(collect * p, n * 0.5), alpha = 0.95,
        init.sel = c("Mahalanobis", "dUniMedian", "random", "manual"),
        man.sel, maxsteps = 100, allowSingular = FALSE, verbose = TRUE)

Arguments

x numeric matrix (of dimension [n x p]), not supposed to contain missing values.
collect a multiplication factor, when init.sel is not "manual", to define m, the size of the initial basic subset, as m <- min(p * collect, n/2).
m integer in 1:n specifying the size of the initial basic subset; used only when init.sel is not "manual".
alpha significance level for the chisq cutoff, used to define the next iterations basic subset.
init.sel character string, specifying the initial selection mode; implemented modes are:
"Mahalanobis"
based on Mahalanobis distances (default)
"dUniMedian"
based on the distances from the univariate medians
"random"
based on a random selection
"manual"
based on manual selection; in this case, a vector man.sel containing the indices of the selected observations must be specified.
"Mahalanobis", "dUniMedian" where proposed by Hadi and the other authors in the reference as versions ‘V_1’ and ‘V_2’, as well as "manual", while "random" is provided in order to study the behaviour of BACON.
man.sel only when init.sel == "manual", the indices of observations determining the initial basic subset (and m <- length(man.sel)).
maxsteps maximal number of iteration steps.
allowSingular logical indicating a solution should be sought also when no matrix of rank p is found.
verbose logical indicating if messages are printed which trace progress of the algorithm.

Value

a list with components

subset logical vector of length n where the i-th entry is true iff the i-th observation is part of the final selection.
dis numeric vector of length n with the (Mahalanobis) distances.
cov p x p matrix, the corresponding robust estimate of covariance.

Author(s)

Ueli Oetliker, Swiss Federal Statistical Office, for S-plus 5.1. Port to R, testing etc, by Martin Maechler

References

Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279–298.

See Also

covMcd for a high-breakdown (but more computer intensive) method; BACON for a “generalization”, notably to regression.

Examples

## simple 2D example :
 plot(starsCYG, main = "starsCYG  data  (n=47)")
 B.st <- mvBACON(starsCYG)
 points(starsCYG[ ! B.st$subset,], pch = 4, col = 2, cex = 1.5)
 ## finds the clear outliers (and 3 "borderline")

 ## 'coleman' from pkg 'robustbase'
 coleman.x <- data.matrix(coleman[, 1:6])
 Cc <- covMcd (coleman.x) # truely robust
 Cb1 <- mvBACON(coleman.x) ##-> subset is all TRUE hmm??
 Cb2 <- mvBACON(coleman.x, init.sel = "dUniMedian")
 ## --> BACON "breaks down" here

[Package robustX version 1.1-2 Index]