parzen {gamlss.dist} | R Documentation |
These are several small data files usefull for gamlss fits.
Parzen: Parzen (1979) and also contained in Hand {it et al.} (1994), data set 278. The data
give the annual snowfall
in Buffalo, NY (inches) for the 63 years, from 1910 to 1972 inclusive.
glass: show the strength
of glass fibres, measured at the National Physical Laboratory, England,
see Smith and Naylor (1987), (the unit of measurement were not given in the paper).
tensile: These data come from Quesenberry and Hales (1980) and were also reproduced in Hand {it et al.} (1994), data set 180, page 140. They contain measurements of tensile strength of polyester fibres and the authors were trying to check if they were consistent with the lognormal distribution. According to Hand {it et al.} (1994) "these data follow from a preliminary transformation. If the lognormal hypothesis is correct, these data should have been uniformly distributed".
margolin: Margolin et al. (1981) present data from an Ames Salmonella assay, where y is the number of revertant colonies observed on a plate given a dose y of quinoline. The data were subsequently analysed by Breslow (1984), Lawless (1987) and Saha and Paul (2005).
computing: The data relate to DEC-20 computers which operated at the Open University in the 1980. They give the number of computers that broke down in each of the 128 consecutive weeks of operation, starting in late 1983, see Hand {it et al.} (1994) page 109 data set 141.
lice : The data come from Williams (1944) and they are lice per head of Hindu male prisoners in Cannamore, South India, 1937-1939.
alveolar : alveolar-bronchiolar adenomas data used by Tamura and Young (1987) and also reproduce in Hand {it et al.} (1994), data set 256. The data are the number of mice out of certain number of mice (the binomial denominator) in 23 independent groups, having alveolar-bronchiolar adenomas.
species: The number of different fish species (y=fish
) was recorded
for 70 lakes of the world together with explanatory variable
x=log(lake)
area. The data are given and analyzed by Stein and Juritz (1988).
CD4: The data were given by Wade and Ades (1994) and refer to cd4 counts from uninfected children born to HIV-1 mothers and the age of the child.
LGAclaims: the data were given by Gillian Heller and can be found in de Jong and Heller (2007).
This data set records the number of third party claims, Claims
, in a twelve month
period between 1984-1986 in each of 176 geographical areas (local government areas) in New South Wales,
Australia. Areas are grouped into thirteen statistical divisions (SD
). Other
recorded variables are the number of accidents,
Accidents
, the number of people killed or injured and population with all variables classified
according to area.
data(parzen) data(glass) data(tensile) data(margolin) data(computer) data(lice) data(alveolar) data(species) data(CD4) data(LGAclaims)
Data frames each with the following variable.
snowfall
strength
str
y
x
failure
head
freq
r
n
fish
lake
Data sets usefull for the GAMLSS booklet
Breslow, N. (1984) Extra-Poisson variation in log-linear models. Applied Statistics, 33, 38-44.
de Jong, P. and Heller G. (2007) Generalized Linear Models for Insurance Data , Cambridge University Press
Hand et al. (1994) A handbook of small data sets. Chapman and Hall, London.
Lawless, J.F. (1987) Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics, 15, 209-225.
Margolin, B.H., Kaplan, N. and Zeiger, E. (1981) Statistical analysis of the Ames salmonella/microsome test. Proceedings of the National Academy of Science, U.S.A., 76, 3779-3783.
Quesenberry, C. and Hales, C. (1980). Concentration bands for uniformily plots. Journal of Statistical Computation and Simulation, 11, 41:53.
Parzen E. (1984) Nonparamemetric statistical daya modelling. JASA, 74, 105-131.
Saha, K. and Paul, S. (2005) Bias-Corrected Maximum Likelihood Estimator of the Negative Binomial Dispersion Parameter. Biometrics, 61, 179-185
Smith R. L. Naylor, J. C. (1987) A comparison of maximum likelihood and Bayesian estimators for the three-parameter Weibull distributuion. Appl. Statist. 36, 358-369
Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, http://www.jstatsoft.org/v23/i07.
Stein, G. Z. and Juritz, J. M. (1988). Linear models with an inverse Gaussian-Poisson error distribution. Communications in Statistics- Theory and Methods, 17, 557-571.
Wade, A. M. and Ader, A. E. (1994) Age-related reference ranges : Significance tests for models and confidence intervals for centiles. Statistics in Medicine, 13, pages 2359-2367.
data(parzen) with(parzen, hist(snowfall)) data(glass) with(glass, hist(strength)) data(tensile) with(tensile,hist(str)) data(margolin) with(margolin, plot(y~x)) data(computer) with(computer, plot(table(failure))) data(lice) with(lice, plot(freq~head, type="h")) data(alveolar) with(alveolar, hist(r/n)) data(species) with(species, plot(fish~log(lake))) data(CD4) with(CD4,plot(cd4~age)) data(LGAclaims) with(LGAclaims, plot(data.frame(Claims, Pop_density, KI, Accidents, Population)))