getOutliers {extremevalues}R Documentation

Detect outliers

Description

Detects outliers in one dimensional data, based on the assumption that the bulk of (the right side of) the observed data distribution can be adequately described by a model distribution.

Usage

getOutliers(y, rho=0.1, pval=c(0.5,0.9), method="lognormal")

Arguments

y Vector of one-dimensional nonnegative data
rho A value y_i is an outlier if it is above the limit where less then rho observations are expected. Must be >=0.
pval c(pmin,pmax) quantile limits indicating which data should be used to fit the model distribution. Must obey 0 < pmin < pmax < 1.
method Model distributiun used to estimate the limit. Choose from "lognormal" (default), "exponential", "pareto", "weibull" or "normal".

Details

The function sorts the values of y and uses (log)linear regression to fit the values between the pmin and pmax quantile to the cdf of a model distribution. Given a model cdf F, the outlier limit l is the value above which less than rho values are expected, conditional on the total number of observations in y: l=F^{-1}(1-rho/N|hat{theta}). Here, theta are the cdf's estimated parameters.

Value

iOut Index vector indicating where y > limit
nOut Number of outliers. The largest nOut values of y are outliers
limit Outlier limit. Elements of y larger then or equal to limit are considered outliers
Npop Length of y
method method
rho The rho-value
pmin pval[1]
pmax pval[2]
Nfit Number of values used in the fit
R2 R-squared value for the fit
lambda (exponential distribution) Estimated location (and spread) parameter for f(y)=λexp(-λ y)
mu (lognormal distribution) Estimated E(ln(y)) for lognormal distribution
sigma (lognormal distribution) Estimated Var(ln(y)) for lognormal distribution
ym (pareto distribution) Estimated location parameter (mode) for pareto distribution
alpha (pareto distribution) Estimated spread parameter for pareto distribution
k (weibull distribution) estimated shape parameter k for weibull distribution
lambda (weibull distribution) estimated scale parameter λ for weibull distribution
mu (normal distribution) Estimated E(y) for normal distribution
sigma (normal distribution) Estimated Var(y) for normal distribution

Author(s)

Mark van der Loo, see www.markvanderloo.eu

References

An outlier detection method for economic data, M.P.J. van der Loo, Submitted to The Journal of Official Statistics (November 2009)

The file <your R directory>/R-<version>/library/extremevalues/extremevalues.pdf contains a worked example. It can also be downloaded from my website.

Examples

y <- c(10^rnorm(50),500);
L <- getOutliers(y,rho=0.5);
outlierPlot(y,L);

[Package extremevalues version 1.0 Index]