AsymTotalVarDist {distrEx}R Documentation

Generic function for the computation of asymmetric total variation distance of two distributions

Description

Generic function for the computation of asymmetric total variation distance d_v(rho) of two distributions P and Q where the distributions may be defined for an arbitrary sample space (Omega, A). For given ratio of inlier and outlier probability rho, this distance is defined as

d_v(rho)(P,Q)=int max(dQ-c dP,0)

for c defined by

rho int max(dQ-c dP,0) = int max(c dP-dQ,0)

It coincides with total variation distance for rho=1.

Usage

AsymTotalVarDist(e1, e2, ...)
## S4 method for signature 'AbscontDistribution,
##   AbscontDistribution':
AsymTotalVarDist(e1,e2, rho = 1,
             rel.tol = .Machine$double.eps^0.3, maxiter=1000, Ngrid = 10000,
             TruncQuantile = getdistrOption("TruncQuantile"),
             IQR.fac = 15)
## S4 method for signature 'AbscontDistribution,
##   DiscreteDistribution':
AsymTotalVarDist(e1,e2, rho = 1, ...)
## S4 method for signature 'DiscreteDistribution,
##   AbscontDistribution':
AsymTotalVarDist(e1,e2, rho = 1, ...)
## S4 method for signature 'DiscreteDistribution,
##   DiscreteDistribution':
AsymTotalVarDist(e1,e2, rho = 1, ...)
## S4 method for signature 'numeric, DiscreteDistribution':
AsymTotalVarDist(e1, e2, rho = 1, ...)
## S4 method for signature 'DiscreteDistribution, numeric':
AsymTotalVarDist(e1, e2, rho  = 1, ...)
## S4 method for signature 'numeric, AbscontDistribution':
AsymTotalVarDist(e1, e2, rho = 1, asis.smooth.discretize = "discretize", 
            n.discr = getdistrExOption("nDiscretize"), low.discr = getLow(e2),
            up.discr = getUp(e2), h.smooth = getdistrExOption("hSmooth"),
             rel.tol = .Machine$double.eps^0.3, maxiter=1000, Ngrid = 10000,
             TruncQuantile = getdistrOption("TruncQuantile"),
             IQR.fac = 15)
## S4 method for signature 'AbscontDistribution, numeric':
AsymTotalVarDist(e1, e2,  rho = 1,
            asis.smooth.discretize = "discretize", 
            n.discr = getdistrExOption("nDiscretize"), low.discr = getLow(e1),
            up.discr = getUp(e1), h.smooth = getdistrExOption("hSmooth"),
             rel.tol = .Machine$double.eps^0.3, maxiter=1000, Ngrid = 10000,
             TruncQuantile = getdistrOption("TruncQuantile"),
             IQR.fac = 15)
## S4 method for signature 'AcDcLcDistribution,
##   AcDcLcDistribution':
AsymTotalVarDist(e1, e2,
          rho = 1, rel.tol = .Machine$double.eps^0.3, maxiter=1000, Ngrid = 10000,
             TruncQuantile = getdistrOption("TruncQuantile"),
             IQR.fac = 15)

Arguments

e1 object of class "Distribution" or "numeric"
e2 object of class "Distribution" or "numeric"
asis.smooth.discretize possible methods are "asis", "smooth" and "discretize". Default is "discretize".
n.discr if asis.smooth.discretize is equal to "discretize" one has to specify the number of lattice points used to discretize the abs. cont. distribution.
low.discr if asis.smooth.discretize is equal to "discretize" one has to specify the lower end point of the lattice used to discretize the abs. cont. distribution.
up.discr if asis.smooth.discretize is equal to "discretize" one has to specify the upper end point of the lattice used to discretize the abs. cont. distribution.
h.smooth if asis.smooth.discretize is equal to "smooth" – i.e., the empirical distribution of the provided data should be smoothed – one has to specify this parameter.
rho ratio of inlier/outlier radius
rel.tol relative tolerance for distrExIntegrate and uniroot
maxiter parameter for uniroot
Ngrid How many grid points are to be evaluated to determine the range of the likelihood ratio?
TruncQuantile Quantile the quantile based integration bounds (see details)
IQR.fac Factor for the scale based integration bounds (see details)
... further arguments to be used in particular methods (not in package distrEx)

Details

For distances between absolutely continuous distributions, we use numerical integration; to determine sensible bounds we proceed as follows: by means of min(getLow(e1,eps=TruncQuantile),getLow(e2,eps=TruncQuantile)), max(getUp(e1,eps=TruncQuantile),getUp(e2,eps=TruncQuantile)) we determine quantile based bounds c(low.0,up.0), and by means of s1 <- max(IQR(e1),IQR(e2)); m1<- median(e1); m2 <- median(e2) and low.1 <- min(m1,m2)-s1*IQR.fac, up.1 <- max(m1,m2)+s1*IQR.fac we determine scale based bounds; these are combined by low <- max(low.0,low.1), up <- max(up.0,up1).

Again in the absolutely continuous case, to determine the range of the likelihood ratio, we evaluate this ratio on a grid constructed as follows: x.range <- c(seq(low, up, length=Ngrid/3), q(e1)(seq(0,1,length=Ngrid/3)*.999), q(e2)(seq(0,1,length=Ngrid/3)*.999))

Finally, for both discrete and absolutely continuous case, we clip this ratio downwards by 1e-10 and upwards by 1e10

In case we want to compute the total variation distance between (empirical) data and an abs. cont. distribution, we can specify the parameter asis.smooth.discretize to avoid trivial distances (distance = 1).

Using asis.smooth.discretize = "discretize", which is the default, leads to a discretization of the provided abs. cont. distribution and the distance is computed between the provided data and the discretized distribution.

Using asis.smooth.discretize = "smooth" causes smoothing of the empirical distribution of the provided data. This is, the empirical data is convoluted with the normal distribution Norm(mean = 0, sd = h.smooth) which leads to an abs. cont. distribution. Afterwards the distance between the smoothed empirical distribution and the provided abs. cont. distribution is computed.

Value

Asymmetric Total variation distance of e1 and e2

Methods

e1 = "AbscontDistribution", e2 = "AbscontDistribution":
total variation distance of two absolutely continuous univariate distributions which is computed using distrExIntegrate.
e1 = "AbscontDistribution", e2 = "DiscreteDistribution":
total variation distance of absolutely continuous and discrete univariate distributions (are mutually singular; i.e., have distance =1).
e1 = "DiscreteDistribution", e2 = "DiscreteDistribution":
total variation distance of two discrete univariate distributions which is computed using support and sum.
e1 = "DiscreteDistribution", e2 = "AbscontDistribution":
total variation distance of discrete and absolutely continuous univariate distributions (are mutually singular; i.e., have distance =1).
e1 = "numeric", e2 = "DiscreteDistribution":
Total variation distance between (empirical) data and a discrete distribution.
e1 = "DiscreteDistribution", e2 = "numeric":
Total variation distance between (empirical) data and a discrete distribution.
e1 = "numeric", e2 = "AbscontDistribution":
Total variation distance between (empirical) data and an abs. cont. distribution.
e1 = "AbscontDistribution", e1 = "numeric":
Total variation distance between (empirical) data and an abs. cont. distribution.
e1 = "AcDcLcDistribution", e2 = "AcDcLcDistribution":
Total variation distance of mixed discrete and absolutely continuous univariate distributions.

Author(s)

Peter Ruckdeschel Peter.Ruckdeschel@itwm.fraunhofer.de

References

to be filled; Agostinelli, C and Ruckdeschel, P. (2009): A simultaneous inlier and outlier model by asymmetric total variation distance.

See Also

TotalVarDist-methods, ContaminationSize, KolmogorovDist, HellingerDist, Distribution-class

Examples

AsymTotalVarDist(Norm(), Gumbel(), rho=0.3)
AsymTotalVarDist(Norm(), Td(10), rho=0.3)
AsymTotalVarDist(Norm(mean = 50, sd = sqrt(25)), Binom(size = 100), rho=0.3) # mutually singular
AsymTotalVarDist(Pois(10), Binom(size = 20), rho=0.3) 

x <- rnorm(100)
AsymTotalVarDist(Norm(), x, rho=0.3)
AsymTotalVarDist(x, Norm(), asis.smooth.discretize = "smooth", rho=0.3)

y <- (rbinom(50, size = 20, prob = 0.5)-10)/sqrt(5)
AsymTotalVarDist(y, Norm(), rho=0.3)
AsymTotalVarDist(y, Norm(), asis.smooth.discretize = "smooth", rho=0.3)

AsymTotalVarDist(rbinom(50, size = 20, prob = 0.5), Binom(size = 20, prob = 0.5), rho=0.3)

[Package distrEx version 2.1 Index]