sbfit {scaleboot} | R Documentation |
sbfit
is used to fit parametric models to multiscale bootstrap
probabilities by the maximum likelihood method.
sbfit(x, ...) ## Default S3 method: sbfit(x,nb,sa,models=NULL,nofit=FALSE,...) ## S3 method for class 'matrix': sbfit(x,nb,sa,models=NULL,names.hp=rownames(x), nofit=FALSE,cluster=NULL,...) ## S3 method for class 'data.frame': sbfit(x,...) ## S3 method for class 'scaleboot': sbfit(x,models=names(x$fi),...) ## S3 method for class 'scalebootv': sbfit(x,models=attr(x,"models"),...) ## S3 method for class 'scaleboot': print(x,sort.by=c("aic","none"),...) ## S3 method for class 'scalebootv': print(x,...)
x |
an object used to select a method. For sbfit.default ,
x is denoted by nb and is a vector of bootstrap
probabilities for a hypothesis. For sbfit.matrix , x is
denoted by bps and is a matrix with row vectors of bp
for several hypotheses.
|
nb |
vector of numbers of bootstrap replicates. A short vector
(or scalar) is cyclically extended to match the size of bp . |
sa |
vector of scales in sigma squared (σ^2). Should
be the same size as bp . |
models |
character vector of model names. Valid model names are
poly.m for m>=1 and sing.m for
m>=3. The default is set by sboptions()$models , whose
default is c("poly.1","poly.2","poly.3","sing.3","sphe.3").
If models is an integer value, sbmodelnames(m=models) is
used. |
nofit |
logical. If TRUE, fitting is not performed. |
names.hp |
character vector of hypotheses names. |
cluster |
snow cluster object which may be generated by
function makeCluster . |
sort.by |
sort key. |
... |
further arguments passed to and from other methods. |
sbfit.default
fits parametric models to bp
by maximizing the log-likelihood value of a binomial model.
A set of multiscale bootstrap resampling
should be performed before a call to sbfit
for preparing
bp
, where bp[i]
is a bootstrap probability of a
hypothesis calculated with a number of bootstrap
replicates nb[i]
and a scale σ^2=sa[i]
.
The scale is defined as σ^2=n/n',
where n is the sample size of data, and n' is the sample
size of replicated data for bootstrap resampling.
Each model specifies a psi(beta,s)
=psi(σ^2 | β)
function with a parameter vector β. The model
may describe how the bootstrap probability changes along the scale.
Let cnt[i]=bp[i]*nb[i]
be the frequency indicating how many
times the hypothesis of interest is observed in bootstrap replicates
at scale sa[i]
. Then we assume that cnt[i]
is binomially
distributed with number of trials nb[i]
and success
probability 1-\pnorm(psi(beta,s=sa[i])/sqrt(sa[i]))
. Currently,
sbpsi.poly
and sbpsi.sing
are available as psi
functions. The estimated model parameters are accessed by the
coef.scaleboot
method.
The model fitting is performed in the order
specified in models
, and the initial values for numerical
optimization of the likelihood function are prepared by using
previously estimated model parameters. Thus, "poly.(m-1)"
should be specified before "poly.m", and "poly.(m-1)" and "sing.(m-1)"
should be specified before "sing.m".
sbfit.matrix
calls sbfit.default
repeatedly, once for each row
vector bp
of the matrix bps
. Parallel
computing is performed when cluster
is non NULL.
sbfit.scaleboot
calls sbfit.default
with bp
,
nb
, and sa
components in x
object for refitting by
giving another models
argument. It discards the previous result
of fitting, and recomputes the model parameters.
sbfit.scalebootv
calls sbfit.matrix
with the bps
,
nb
, and sa
components in the attributes of x
.
sbfit.default
and sbfit.scaleboot
return an object of
class "scaleboot"
, and sbfit.matrix
and
sbfit.scalebootv
return an object of
class "scalebootv"
.
An object of class "scaleboot"
is a list containing at least the
following components:
bp |
the vector of bootstrap probabilities used. |
nb |
the rep(nb,length=length(bp)) used. |
sa |
the sa used. |
fi |
list vector of fitted results for models used. Each
list consists of components "par" (estimated parameter),
"mag" (magnification factor for "par" to make the actual
parameter vector beta=par*mag ),
"value" (maximum log-likelihood), "hessian" (hessian
matrix), "var" (variance estimate of "par" ),
"mask" (logical vector indicating parameter elements which are
not at boundaries),
"init" (initial values used for optimization),
"psi" (psi function name of the model), "df"
(degrees of freedom), "rss" (equivalent to the residual sum
of squares, but actually defined as 2*(lik0-lik) where lik0 and lik
are the log-likelihood function of the non-restricted model and the
model of interest, respectively), "pfit" (p-value for
"rss" ),
"aic" (aic value of the model relative to the non-restricted
model). |
An object of class "scalebootv"
is a vector of
"scaleboot"
objects, and in addition, it has attributes
"models"
, "bps"
, "nb"
, and "sa"
.
Hidetoshi Shimodaira <shimo@is.titech.ac.jp>
Shimodaira, H. (2002). An approximately unbiased test of phylogenetic tree selection, Systematic Biology, 51, 492-508.
Shimodaira, H. (2004). Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling, Annals of Statistics, 32, 2616-2641.
Shimodaira, H. (2008). Testing Regions with Nonsmooth Boundaries via Multiscale Bootstrap, Journal of Statistical Planning and Inference, 138, 1227-1241. (http://dx.doi.org/10.1016/j.jspi.2007.04.001).
sbpsi
, summary.scaleboot
,
plot.scaleboot
, coef.scaleboot
,
sbaic
.
## Testing a hypothesis ## Examples of fitting models to a vector of bp's ## mam15.relltest$t4 of data(mam15), but ## using a different set of scales (sigma^2 values). ## In the below, sigma^2 ranges 0.01 to 100 in sa[i] ## This very large range is only for illustration. ## Typically, the range around 0.1 to 10 ## is recommended for much better model fitting. ## In other examples, we have used ## sa = 9^seq(-1,1,length=13). cnt <- c(0,0,0,0,6,220,1464,3565,5430,6477,6754, 6687,5961) # observed frequencies at scales nb <- 100000 # number of replicates at each scale bp <- cnt/nb # bootstrap probabilities (bp's) sa <- 10^seq(-2,2,length=13) # scales (sigma squared) ## model fitting to bp's f <- sbfit(bp,nb,sa) # model fitting ("scaleboot" object) f # print the result of fitting plot(f,legend="topleft") # observed bp's and fitted curves ## approximately unbiased p-values summary(f) # calculate and print p-values ## refitting with models up to "poly.4" and "sing.4" f <- sbfit(f,models=1:4) f # print the result of fitting plot(f,legend="topleft") # observed bp's and fitted curves summary(f) # calculate and print p-values ## Testing multiple hypotheses (only two here) ## Examples of fitting models to vectors of bp's ## mam15.relltest[c("t1,t2")] cnt1 <- c(85831,81087,76823,72706,67946,62685,57576,51682, 45887,41028,35538,31232,27832) # cnt for "t1" cnt2 <- c(2,13,100,376,975,2145,3682,5337,7219,8559, 10069,10910,11455) # cnt for "t2" cnts <- rbind(cnt1,cnt2) nb <- 100000 # number of replicates at each scale bps <- cnts/nb # row vectors are bp's sa <- 9^seq(-1,1,length=13) # scales (sigma squared) fv <- sbfit(bps,nb,sa) # returns a "scalebootv" object fv # print the result of fitting plot(fv) # multiple plots summary(fv) # calculate and print p-values