missing.compositions {compositions} | R Documentation |
This help section discusses some general strategies of working with missing valuess in a compositional, relative or vectorial context and shows how the various types of missings are represented and treated in the "compositions" package, according to each strategy/class of analysis of compositions or amounts.
is.BDL(x,mc=attr(x,"missingClassifier")) is.SZ(x,mc=attr(x,"missingClassifier")) is.MAR(x,mc=attr(x,"missingClassifier")) is.MNAR(x,mc=attr(x,"missingClassifier")) is.NMV(x,mc=attr(x,"missingClassifier")) is.WMNAR(x,mc=attr(x,"missingClassifier")) is.WZERO(x,mc=attr(x,"missingClassifier")) has.missings(x,...) ## Default S3 method: has.missings(x,mc=attr(x,"missingClassifier"),...) ## S3 method for class 'rmult': has.missings(x,mc=attr(x,"missingClassifier"),...) SZvalue MARvalue MNARvalue BDLvalue
x |
A vector, matrix, acomp, rcomp, aplus, rplus object for which we would like to know the missing status of the entries |
mc |
A missing classifier function, giving for each value one of the values BDL (Below Detection Limit), SZ (Structural Zero), MAR (Missing at random), MNAR (Missing not at random), NMV (Not missing value) This functions are introduced to allow a different coding of the missings. |
... |
further generic arguments |
In the context of compositional data we have to consider at least four types of missing and zero values:
is.XXX
checks the status of its argument according to
the XXX type of value from those above.
rplus
: For positive real vectors, one can either identify BDL
with a true 0 or impute a value relative to the detection limit, with a
function like zeroreplace
. A structural zero can either
be seen as a true zero or as a MAR value.
rcomp
and acomp
: For these
relative geometries, a true zero is an alien. Thus
a BDL is nothing else but a small unkown value. We could either decide
to replace the value by an imputation, or go through the whole analysis
keeping this lack of information in mind.
The main problem of imputation is that by
closing to 1, the absolute value of the detection limit is lost, and
the detection limit can correspond to very different portions. Raw differences
between all, observed or missed, components (the ground of the rcomp geometry)
are completely distorted by the replacement. Contrarily, log-ratios between observed
components do not change but ratios between missed components
dramatically depend on the replacement, e.g. typically the content of gold is some orders of
magnitude smaller than the contend of silver even around a gold
deposit, but far away from the deposit they both might be far under detection
limit, leading to a ratio of 1, just because nothing was observed. SZ in compositions
might be either seen as defining two sub-populations, one fully defined and one where
only a subcomposition is defined. But SZ can also
very much be like an MAR, if only a subcomposition is measured. Thus, in general
we can simply understand that only a subcomposition is available, i.e. a
projection of the true value onto a sub-space: for each observation, this sub-space
might be different. For MAR values, this approach
is stricly valid, and yields unbiased estimations (because these projections are stochastically independent of the observed phenomenon). For MNAR values, the
projections depend on the actual value, which strictly speaking yields
biased estimations.
aplus
:
Imputation takes place by simple replacement of the value. However
this can lead to a dramatic change of ratios and should thus be used
only with extra care, by the same reasons explained before.
A logical vector or matrix with the same shape as x stating wether or not the value is of the given type of missing.
K.Gerald v.d. Boogaart http://www.stat.boogaart.de, Raimon Tolosana Delgado, Matevz Bren
Boogaart, K.G. v.d., R. Tolosana-Delgado, M. Bren (2006) Concepts for handling of zeros and missing values in compositional data, in E. Pirard (ed.) (2006)Proccedings of the IAMG'2006 Annual Conference on "Quantitative Geology from multiple sources", September 2006, Liege, Belgium, S07-01, 4pages, http://www.math-inf.uni-greifswald.de/~boogaart/Publications/iamg06_s07_01.pdf, ISBN: 978-2-9600644-0-7
Aitchison, J. (1986) The Statistical Analysis of Compositional
Data Monographs on Statistics and Applied Probability. Chapman &
Hall Ltd., London (UK). 416p.
Aitchison, J, C. Barcel'o-Vidal, J.J. Egozcue, V. Pawlowsky-Glahn
(2002) A consise guide to the algebraic geometric structure of the
simplex, the sample space for compositional data analysis, Terra
Nostra, Schriften der Alfred Wegener-Stiftung, 03/2003
Billheimer, D., P. Guttorp, W.F. and Fagan (2001) Statistical interpretation of species composition,
Journal of the American Statistical Association, 96 (456), 1205-1214
Mart'{i}n-Fern'andez, J.A., C. Barcel'o-Vidal, and V. Pawlowsky-Glahn (2003)
Dealing With Zeros and Missing Values in Compositional
Data Sets Using Nonparametric Imputation. Mathematical Geology, 35(3)
253-278
zeroreplace
, rmult
, ilr
,
mean.acomp
, acomp
, plot.acomp
require(compositions) # load library data(SimulatedAmounts) # load data sa.lognormals