fitdist {fitdistrplus} | R Documentation |
Fits a univariate distribution to non-censored data by maximum likelihood or matching moments, and computes goodness-of-fit statistics.
fitdist(data, distr, method="mle", start, chisqbreaks, meancount) ## S3 method for class 'fitdist': print(x,...) ## S3 method for class 'fitdist': plot(x,breaks="default",...) ## S3 method for class 'fitdist': summary(object,...)
data |
A numeric vector. |
distr |
A character string "name" naming a distribution for which the corresponding
density function dname , the corresponding distribution function pname and the corresponding
quantile function qname must be defined, or directly the density function. |
method |
A character string coding for the fitting method:
"mle" for 'maximum likelihood' and "mom" for 'matching moments'. |
start |
A named list giving the initial values of parameters of the named distribution.
This argument will not be taken into account if method="mom" ,
and may be omitted for some distributions for which reasonable
starting values are computed if method="mle" (see details). |
chisqbreaks |
A numeric vector defining the breaks of the cells used to compute the chi-squared
statistic. If omitted, these breaks are automatically computed from the data
in order to reach roughly the same number of observations per cell, roughly equal to the argument
meancount , or sligthly more if there are some ties. |
meancount |
The mean number of observations per cell expected for the definition of the breaks of the cells
used to compute the chi-squared statistic. This argument will not be taken into account if the breaks
are directly defined in the argument chisqbreaks . If chisqbreaks and meancount are both
omitted, meancount is fixed in order to obtain roughly (4n)^{2/5} cells with n the length of the dataset. |
x |
an object of class 'fitdist'. |
object |
an object of class 'fitdist'. |
breaks |
If "default" the histogram is plotted with the function hist
with its default breaks definition. Else breaks is passed to the function hist .
This argument is not taken into account with discrete distributions: "binom" ,
"nbinom" , "geom" , "hyper" and "pois" . |
... |
further arguments passed to or from other methods |
When method="mle"
,
maximum likelihood estimations of the distribution parameters are computed using
the function mledist
.
Direct optimization of the log-likelihood is performed using optim
, with its
default method "Nelder-Mead"
for distributions characterized by more than one parameter and
the method "BFGS"
for distributions characterized by only one parameter.
For the following named distributions, reasonable starting values will
be computed if start
is omitted : "norm"
, "lnorm"
,
"exp"
and "pois"
, "cauchy"
, "gamma"
, "logis"
,
"nbinom"
(parametrized by mu and size), "geom"
, "beta"
and "weibull"
. Note that these starting
values may not be good enough if the fit is poor. The function is not able to fit a uniform distribution.
With the parameter estimates, the function returns the log-likelihood and the standard errors of
the estimates calculated from the
Hessian at the solution found by optim
.
When method="mom"
,
the estimated values of the distribution parameters are provided only for the following
distributions : "norm"
, "lnorm"
, "pois"
, "exp"
, "gamma"
,
"nbinom"
, "geom"
, "beta"
, "unif"
and "logis"
.
For distributions characterized by one parameter ("geom"
, "pois"
and "exp"
), this parameter is simply
estimated by matching theoretical and observed means, and for distributions characterized by
two parameters, these parameters are estimated by matching theoretical and observed means
and variances (Vose, 2000).
Goodness-of-fit statistics are computed. The Chi-squared statistic is computed using cells defined by the argument
chisqbreaks
or cells automatically defined from the data in order
to reach roughly the same number of observations per cell, roughly equal to the argument
meancount
, or sligthly more if there are some ties. If chisqbreaks
and meancount
are both
omitted, meancount
is fixed in order to obtain roughly (4n)^{2/5} cells, with n the length of the dataset (Vose, 2000).
The Chi-squared statistic is not computed if the program fails
to define enough cells due to a too small dataset. When the Chi-squared statistic is computed, and if the degree
of freedom (nb of cells - nb of parameters - 1) of the corresponding distribution is strictly positive, the p-value
of the Chi-squared test is returned.
For the distributions assumed continuous (all but "binom"
,
"nbinom"
, "geom"
, "hyper"
and "pois"
), Kolmogorov-Smirnov and Anderson-Darling
statistics are also computed, as defined by Cullen and Frey (1999).
An approximate Kolmogorov-Smirnov test is performed by assuming the distribution parameters known. The critical value defined by Stephens (1986) for a completely specified distribution is used to reject or not the distribution at the significance level 0.05. Because of this approximation, the result of the test (decision of rejection of the distribution or not) is returned only for datasets with more than 30 observations. Note that this approximate test may be too conservative.
For datasets with more than 5 observations and for distributions for
which the test is described by Stephens (1986) ("norm"
, "lnorm"
,
"exp"
, "cauchy"
, "gamma"
, "logis"
and "weibull"
),
the Anderson-darling test is performed as described by Stephens (1986). This test takes into account the
fact that the parameters are not known but estimated from the data. The result is the decision to reject
or not the distribution at the significance level 0.05.
The plot of an object of class "fitdist" returned by fitdist
uses the function plotdist
.
fitdist
returns an object of class 'fitdist', a list with 16 components,
estimate |
the parameter estimates |
method |
the character string coding for the fitting method :
"mle" for 'maximum likelihood' and "mom" for 'matching moments' |
sd |
the estimated standard errors or NULL if method="mom" |
cor |
the estimated correlation matrix or NULL if method="mom" |
loglik |
the log-likelihood or NULL if method="mom" |
n |
the length of the data set |
data |
the dataset |
distname |
the name of the distribution |
chisq |
the Chi-squared statistic or NULL if not computed |
chisqbreaks |
breaks used to define cells in the Chi-squared statistic |
chisqpvalue |
p-value of the Chi-squared statistic or NULL if not computed |
chisqdf |
degree of freedom of the Chi-squared distribution or NULL if not computed |
chisqtable |
a table with observed and theoretical counts used for the Chi-squared calculations |
ad |
the Anderson-Darling statistic or NULL if not computed |
adtest |
the decision of the Anderson-Darling test or NULL if not computed |
ks |
the Kolmogorov-Smirnov statistic or NULL if not computed |
kstest |
the decision of the Kolmogorov-Smirnov test or NULL if not computed |
Marie-Laure Delignette-Muller ml.delignette@vet-lyon.fr
Cullen AC and Frey HC (1999) Probabilistic techniques in exposure assessment. Plenum Press, USA, pp. 81-155.
Stephens MA (1986) Tests based on edf statistics. In Goodness-of-fit techniques (D'Agostino RB and Stephens MA, eds), Marcel dekker, New York, pp. 97-194.
Venables WN and Ripley BD (2002) Modern applied statistics with S. Springer, New York, pp. 435-446.
Vose D (2000) Risk analysis, a quantitative guide. John Wiley & Sons Ltd, Chischester, England, pp. 99-143.
plotdist
, optim
, mledist
, momdist
and fitdistcens
.
x1<-c(6.4,13.3,4.1,1.3,14.1,10.6,9.9,9.6,15.3,22.1,13.4, 13.2,8.4,6.3,8.9,5.2,10.9,14.4) f1<-fitdist(x1,"norm") print(f1) plot(f1) summary(f1) f1$chisqtable f1b<-fitdist(x1,"norm",method="mom",meancount=6) summary(f1b) f1b$chisqtable f1c<-fitdist(x1,"lnorm",method="mom",meancount=6) summary(f1c) f1c$chisqtable dgumbel<-function(x,a,b) 1/b*exp((a-x)/b)*exp(-exp((a-x)/b)) pgumbel<-function(q,a,b) exp(-exp((a-q)/b)) qgumbel<-function(p,a,b) a-b*log(-log(p)) f1c<-fitdist(x1,"gumbel",start=list(a=10,b=5)) print(f1c) plot(f1c) x2<-c(rep(4,1),rep(2,3),rep(1,7),rep(0,12)) f2<-fitdist(x2,"pois",chisqbreaks=c(0,1)) plot(f2) summary(f2) f2$chisqtable xw<-rweibull(n=100,shape=2,scale=1) fa<-fitdist(xw,"weibull") summary(fa) fa$chisqtable fb<-fitdist(xw,"gamma") summary(fb) fc<-fitdist(xw,"exp") summary(fc)