aster {aster} | R Documentation |
Fits Aster Models.
aster(x, ...) ## Default S3 method: aster(x, root, pred, fam, modmat, parm, type = c("unconditional", "conditional"), famlist = fam.default(), origin, origin.type = c("model.type", "unconditional", "conditional"), method = c("trust", "nlm", "CG", "L-BFGS-B"), fscale, maxiter = 1000, nowarn = TRUE, newton = TRUE, optout = FALSE, coef.names, ...) ## S3 method for class 'formula': aster(formula, pred, fam, varvar, idvar, root, data, parm, type = c("unconditional", "conditional"), famlist = fam.default(), origin, origin.type = c("model.type", "unconditional", "conditional"), method = c("trust", "nlm", "CG", "L-BFGS-B"), fscale, maxiter = 1000, nowarn = TRUE, newton = TRUE, optout = FALSE, ...)
x |
an nind by nnode matrix, the data for an
aster model. The rows are independent and identically modeled
random vectors. See details below for further requirements.
aster.formula constructs such an x from the response
in its formula. Hence data for aster.formula must have
nind * nnode rows. |
root |
an object of the same shape as x , the root data.
For aster.default an nind by nnode matrix,
For aster.formula an nind * nnode vector. |
pred |
an integer vector of length nnode determining
the dependence
graph of the aster model. pred[j] is
the index of the predecessor of
the node with index j unless the predecessor is a root
node, in which case pred[j] == 0 . See details below for
further requirements. |
fam |
an integer vector of length nnode determining
the exponential family structure of the aster model. Each element
is an index into the vector of family specifications given by
the argument famlist . |
modmat |
an nind by nnode by ncoef
three-dimensional array, the model matrix.
aster.formula constructs such a modmat from
its formula, the data frame data , and the variables
in the environment of the formula. |
parm |
usually missing. Otherwise a vector of length ncoef
giving a starting point for the optimization. |
type |
type of model. The value of this argument can be abbreviated. |
famlist |
a list of family specifications (see families ). |
origin |
Distinguished point in parameter space. May be missing, in which case an unspecified default is provided. See details below for further explanation. |
origin.type |
Parameter space in which specified distinguished point
is located. If "conditional" then argument "origin" is
a conditional canonical parameter value.
If "unconditional" then argument "origin" is
an unconditional canonical parameter value.
If "model.type" then the type is taken from argument "type" .
The value of this argument can be abbreviated. |
method |
optimization method. If "trust" then the
trust function is used. If "nlm" then the
nlm function is used. Otherwise the
optim function is used with the specified method
supplied to it.
The value of this argument can be abbreviated. |
fscale |
an estimate of the size of the log likelihood at the maximum.
Defaults to nind . |
maxiter |
maximum number of iterations. Defaults to '1000'. |
nowarn |
if TRUE (the default), suppress warnings from
the optimization routine. |
newton |
if TRUE (the default), do one Newton iteration
on the result produced by the optimization routine, except when
method == "trust" when no such Newton iteration is done,
regardless of the value of newton , because trust
always terminates with a Newton iteration when it converges. |
optout |
if TRUE , save the entire result of the optimization
routine (trust , nlm , or optim ,
as the case may be). |
coef.names |
names of the regression coefficients. If missing,
dimnames(modmat)[[3]] is used. In aster.formula these
are produced automatically by the R formula machinery. |
... |
other arguments passed to the optimization method. |
formula |
a symbolic description of the model to be fit. See
lm , glm , and
formula for discussions of the R formula mini-language. |
varvar |
a variable of the same length as the response in
the formula that is a factor whose levels are character strings
treated as variable names. The number of variable names is nnode .
Must be of the form rep(vars, each = nind) where vars is
a vector of variable names. Usually found in the data frame data
when this is produced by the reshape function. |
idvar |
a variable of the same length as the response in
the formula that indexes individuals. The number
of individuals is nind .
Must be of the form rep(inds, times = nnode) where inds is
a vector of labels for individuals. Usually found in the data frame
data when this is produced by the reshape function. |
data |
an optional data frame containing the variables
in the model. If not found in data , the variables are taken
from environment(formula) , typically the environment from
which aster is called. Usually produced by
the reshape function. |
The vector pred
must satisfy all(pred < seq(along = pred))
,
that is, each predecessor must precede in the order given in pred.
The vector pred
defines a function p.
The joint distribution of the data matrix x
is a product of conditionals
prod[i] prod[j] Pr(x[i, j], x[i, p(j)])
When p(j) = 0, the notation x[i, p(j)]
means root[i, j]
. Other elements of the matrix root
are
not used.
The conditional distribution
Pr(x[i, j], x[i, p(j)])
is the x[i, p(j)]-fold convolution of the j-th family
in the vector fam
, a one-parameter exponential family
(i.e., the sum of x[i, p(j)] i.i.d. terms having
this one-parameter exponential family distribution).
For type == "conditional"
the canonical parameter vector
theta[i, j] is modeled in GLM fashion as
theta = a + M beta where M is the model
matrix modmat
and a is the distinguished point origin
.
Since the “vector” theta is
actually a matrix, the “matrix” M must correspondingly
be a three-dimensional array. So theta = a + M beta
written out in full is
theta[i, j] = a[i, j] + sum[k] m[i, j, k] beta[k]
This specifies the log likelihood.
For type == "unconditional"
the canonical parameter vector
for an unconditional model is modeled in GLM fashion as
phi = a + M beta (where the notation is as above).
The unconditional canonical parameters are then specified in terms of
the conditional ones by
phi[i, j] = theta[i, j] - sum[k in S(j)] psi[k](theta[i, k])
where S(j) denotes the set of successors of j, the k such that p(k) = j, and psi[k] is the cumulant function for the k-th exponential family. This rather crazy looking formulation is an invertible change of parameter and makes phi the canonical parameter and x the canonical statistic of a full flat unconditional exponential family. Again, this specifies the log likelihood.
In versions of aster prior to version 0.6 there was no a in the model specification, which is the same as specifying a = 0 in the current specification. If a is in the column space of the model matrix, that is, if there exists an alpha such that a = M alpha, then there is no difference in the model specified with a and the one with a = 0. The maximum likelihood regression coefficients beta will be different, but the maximum likelihood estimates of all other parameters (conditional and unconditional, canonical and mean value) will be the same. This is the usual case and explains why “linear” models (with a = 0) as opposed to “affine” models (with general a) are popular. In the unusual case where a is not in the column space of the design matrix, then affine models are a generalization of linear models: the two are not equivalent, their maximum likelihood estimates are not the same in any parameterization.
In order to use the R model formula mini-language we must flatten
the dimensionality, making the model matrix modmat
two-dimensional
(a true matrix). This must be done as if by
matrix(modmat, ncol = ncoef)
,
which imposes the requirements on varvar
and idvar
given in the arguments section: they must look like row(x)
and
col(x)
modulo relabeling.
Then x
and root
become one-dimensional, done as if by as.numeric(x)
and as.numeric(root)
.
The standard way to do this in R is to use the reshape
function on a data frame in which the columns of the x
matrix
are variables in the data frame. reshape
automatically puts
things in the right order and creates varvar
and idvar
.
aster
returns an object of class inheriting from "aster"
.
aster.formula
, returns an object of class "aster"
and
subclass "aster.formula"
.
The function summary
(i.e., summary.aster
) can
be used to obtain or print a summary of the results, the function
anova
(i.e., anova.aster
)
to produce an analysis of deviance table, and the function
predict
(i.e., predict.aster
)
to produce predicted values and standard errors.
An object of class "aster"
is a list containing at least the
following components:
coefficients |
a named vector of coefficients. |
rank |
the numeric rank of the fitted generalized linear model
part of the aster model (i.e., the rank of modmat ). |
deviance |
up to a constant, minus twice the maximized log-likelihood. |
iter |
the number of iterations used by the optimization method. |
converged |
logical. Was the optimization algorithm judged to have converged? |
code |
integer. The convergence code returned by the optimization method. |
gradient |
The gradient vector of minus the log likelihood at the
fitted coefficients vector. |
hessian |
The Hessian matrix of minus the log likelihood
(i.e., the observed Fisher information) at the
fitted coefficients vector.
This is also the expected Fisher information when
type == "unconditional" . |
fisher |
Expected Fisher information at the fitted coefficients
vector. |
optout |
The object returned by the optimization routine
(trust , nlm , or optim ).
Only returned when the argument optout is TRUE . |
call |
the matched call. |
formula |
the formula supplied. |
terms |
the terms object used. |
data |
the data argument . |
Geyer, C. J., Wagenius, S., and Shaw, R. G. (2007) Aster Models for Life History Analysis. Biometrika 94 415–426.
Shaw, R. G., Geyer, C. J., Wagenius, S., Hangelbroek, H. H., and Etterson, J. R. (2007) Unifying Life History Analysis for Inference of Fitness and population growth. American Naturalist 172 E35–E47. (e-paper http://www.journals.uchicago.edu/doi/full/10.1086/588063)
anova.aster
,
summary.aster
,
and
predict.aster
### see package vignette for explanation ### library(aster) data(echinacea) vars <- c("ld02", "ld03", "ld04", "fl02", "fl03", "fl04", "hdct02", "hdct03", "hdct04") redata <- reshape(echinacea, varying = list(vars), direction = "long", timevar = "varb", times = as.factor(vars), v.names = "resp") redata <- data.frame(redata, root = 1) pred <- c(0, 1, 2, 1, 2, 3, 4, 5, 6) fam <- c(1, 1, 1, 1, 1, 1, 3, 3, 3) hdct <- grep("hdct", as.character(redata$varb)) hdct <- is.element(seq(along = redata$varb), hdct) redata <- data.frame(redata, hdct = as.integer(hdct)) aout4 <- aster(resp ~ varb + nsloc + ewloc + pop * hdct - pop, pred, fam, varb, id, root, data = redata) summary(aout4, show.graph = TRUE)