fitdAICrc {laser} | R Documentation |
Fits a specified set of rate-variable and rate-constant variants of the birth-death model to branching times from phylogenetic data. The test statistic dAICrc is the difference in AIC scores between the best rate-constant and rate-variable models.
fitdAICrc(x, modelset = c("pureBirth", "bd", "DDL", "DDX", "yule2rate", "rvbd"), ints = NULL)
x |
a numeric vector of branching times |
modelset |
the set of rate-constant and rate-variable candidate models to be fitted |
ints |
the number of intervals. See 'Details' |
fitdAICrc implements the dAICrc test statistic for temporal variation in diversification rates as described in Rabosky (2006).
modelset
is a list of the rate-constant and rate-variable models to consider. You should include
both rate-constant models (pureBirth
and bd
), as well as one or more candidate rate-variable
models. Available options are DDX
, DDL
, yule2rate
, rvbd
, and yule3rate
.
See full descriptions of each of these models this document.
'ints' is used in determining the number of shift points to consider. If 'ints = NULL' (the
default), the model will consider only observed branching times as possible shift points. See
yule-n-rate
for additional discussion of the 'ints' option.
a dataframe with the number of rows equal to the number of candidate models. Columns include likelihoods,
parameters, and AIC scores for each model. The first column contains the model names. If a parameter
is not present in a particular model, it will have an entry of 'NA' in the column for that parameter.
Parameter names follow conventions for model descriptions in other parts of this document. For example,
parameter r1 is the initial net diversification rate for all models (note that this will be the
only rate for the pureBirth model).
The full set of columns if all available models are included in the candidate set will consist
of the following:
model |
the model name for row i in the dataframe |
params |
the free parameters for model[i] |
np |
the number of free parameters in mode[i] |
mtype |
either 'RC' for rate-constant or 'RV' for rate-variable |
LH |
the log-likelihood under model[i] |
r1, r2, r3 |
net diversification rates, as applicable; r1 is always the initial rate, and r3 is always the final rate |
a |
the extinction fraction E/S if applicable |
xp |
the x-parameter from the DDX model |
k |
the k-parameter from the DDL model |
st1, st2 |
shift-times, if applicable. st1 is always the first shift point |
AIC |
the Akaike Information Criterion for model[i] |
dAIC |
delta-AIC; the difference in AIC scores between model[i] and the overall best-fit model |
The dAICrc statistic described in Rabosky (2006) is one of the most powerful tests of temporal variation in diversification rates and may be the only statistical approach that can infer temporal increases in diversification rates. The approach involves fitting both rate-constant and rate-variable models to a set of branching times. The statistic dAICrc is the difference in AIC scores between the best-fit rate-constant and best-fit rate-variable models.
However, several considerations make this approach somewhat computationally intensive. First, use of the AIC (as well as other model selection criteria, such as the AICc and BIC) result in model overfitting. Simply selecting the model with the lowest AIC criterion results in unacceptably high Type I error rates. Moreover, Type I error rates show a dependency on both the size of the phylogeny under consideration, as well as the number and type of rate-variable models in the candidate set. It is thus necessary to derive the critical values of the distribution of dAICrc through simulation.
The approach consists of 3 steps:
1. Analyze the sample phylogeny of N taxa using an appropriate candidate set of rate-variable models
using the function fitdAICrc
. Both the pureBirth
and bd
rate-constant models
must be included in the analysis.
2. Using the program Phyl-O-Gen, the function yuleSim
, or other phylogenetic
simulation software, generate a set of phylogenetic trees or branching times under the pure birth (Yule)
model. Simulated trees should have the same number of tips as the test phylogeny, and should have the same
level of sampling completeness as the test phylogeny.
3. Generate a matrix of branching times using getBtimes.batch
if necessary. Then run
fitdAICrc.batch
on the set of simulated branching times, using the same set of candidate models
and 'ints' used to analyze the test phylogeny. fitdAICrc.batch
will generate the null
distribution of the dAICrc test statistic under the null hypothesis of rate-constancy and will determine
the p-value of the dAICrc index computed for the test phylogeny.
Use of the 'ints' option is recommended. Although the log-likelihood under the discrete-shift rate-variable
models will increase with greater numbers of ints, use of large numbers of ints (>200) typically result in very small improvements. You
can explore the effects of different numbers of 'ints' using rvbd
or the yule-n-rate
set of functions. It is recommended that you explore the data in this way before undertaking this analysis; this
will also give you a sense of the computational time required to fit a given model with different numbers of 'ints'.
In practice, because the significance of dAICrc is assessed through simulation, the actual number of 'ints' is
not critical. However, it is of critical importance that you use the same number of 'ints' when analyzing the
test phylogeny (using fitdAICrc
) and the simulated phylogenies (using fitdAICrc.batch
).
Dan Rabosky DLR32@cornell.edu
Nee, S., R. M. May, and P. H. Harvey. 1994b. The reconstructed evolutionary process. Philos. Trans. R. Soc. Lond. B 344:305-311.
Rabosky, D. L. 2006. Likelihood methods for inferring temporal shifts in diversification rates. Evolution 60:1152-1164.
data(agamids) agbtimes <- getBtimes(string = agamids) #agbtimes is now a vector of branching times from the agamid phylogeny #here we fit 2 rate-constant and 3 rate-variable models # to the agamid data: result <- fitdAICrc(agbtimes, modelset = c("pureBirth", "bd", "DDX", "DDL", "yule2rate"), ints = 100) # this outputs summaries of parameters and likelihoods to screen; # object 'result' is a dataframe containing all parameter estimates, # likelihoods, and AIC scores #we would still need to generate the null distribution of dAICrc #through simulation using fitdAICrc.batch