getOutliers {extremevalues} | R Documentation |
Detects outliers in one dimensional data, based on the assumption that the bulk of (the right side of) the observed data distribution can be adequately described by a model distribution.
getOutliers(y, rho=0.1, pval=c(0.5,0.9), method="lognormal")
y |
Vector of one-dimensional nonnegative data |
rho |
A value y_i is an outlier if it is above the limit where less then rho observations are expected. Must be >=0. |
pval |
c(pmin,pmax) quantile limits indicating which data should be used to fit the model distribution. Must obey 0 < pmin < pmax < 1. |
method |
Model distributiun used to estimate the limit. Choose from "lognormal" (default), "exponential", "pareto", "weibull" or "normal". |
The function sorts the values of y and uses (log)linear regression to fit the values between the pmin and pmax quantile to the cdf of a model distribution. Given a model cdf F, the outlier limit l is the value above which less than rho values are expected, conditional on the total number of observations in y: l=F^{-1}(1-rho/N|hat{theta}). Here, theta are the cdf's estimated parameters.
iOut |
Index vector indicating where y > limit |
nOut |
Number of outliers. The largest nOut values of y are outliers |
limit |
Outlier limit. Elements of y larger then or equal to limit are considered outliers |
Npop |
Length of y |
method |
method |
rho |
The rho-value |
pmin |
pval[1] |
pmax |
pval[2] |
Nfit |
Number of values used in the fit |
R2 |
R-squared value for the fit |
lambda |
(exponential distribution) Estimated location (and spread) parameter for f(y)=λexp(-λ y) |
mu |
(lognormal distribution) Estimated E(ln(y)) for lognormal distribution |
sigma |
(lognormal distribution) Estimated Var(ln(y)) for lognormal distribution |
ym |
(pareto distribution) Estimated location parameter (mode) for pareto distribution |
alpha |
(pareto distribution) Estimated spread parameter for pareto distribution |
k |
(weibull distribution) estimated shape parameter k for weibull distribution |
lambda |
(weibull distribution) estimated scale parameter λ for weibull distribution |
mu |
(normal distribution) Estimated E(y) for normal distribution |
sigma |
(normal distribution) Estimated Var(y) for normal distribution |
Mark van der Loo, see www.markvanderloo.eu
An outlier detection method for economic data, M.P.J. van der Loo, Submitted to The Journal of Official Statistics (November 2009)
The file <your R directory>/R-<version>/library/extremevalues/extremevalues.pdf contains a worked example. It can also be downloaded from my website.
y <- c(10^rnorm(50),500); L <- getOutliers(y,rho=0.5); outlierPlot(y,L);