mixFdr {mixfdr}R Documentation

Fit a mixture model and use it to estimate fdr/FDR/effect sizes

Description

This is the main function for this package. It fits a normal mixture model and uses it to estimate fdr's, FDR's and effect sizes.

Usage

mixFdr(x, J = 3, P = NA, noiseSD = NA, theonull = FALSE, calibrate = FALSE, plots = TRUE, nearlyNull = 0.2, starts = NA, p = NA, maxIter = 1000, tol = 0.001, nocheck = FALSE)

Arguments

x The data. A numeric vector. For example, a vector of transformed t-statistics.
J The number of mixture components.
P The penalization parameter. If NA and calibrate is FALSE, will be length(x)/5. If calibrate = TRUE, chosen by calibration. A higher P pushes the model toward one big null group, so the center is fit more closely.
noiseSD The sampling noise (see details). For transformed t-statistics, or transformed p-values, take this to be 1 at least to start. If NA, fit using a combination of the mixture model and the median of absolute devations estimator. Note that it is hard to estimate both noiseSD and the empirical null accurately at the same time. If your data is overdispersed, and you try to estimate both, you'll probably end up with noiseSD being about as large as the empirical null sd. This is not a problem for fdr/FDR's but it can be a problem for effect sizes (if the signal is known to be sparse, it's ok.
theonull Use the theoretical N(0,1) null?
calibrate Choose P by calibration? Note that calibration can be quite slow.
plots Produce plots?
nearlyNull How big does abs(mu[j]) need to be for a group to be considered non-null? 0 will ensure that only the first group is null.
starts An optional matrix of starting points. This is passed straight to mixModelManyStarts, see that page for more information.
p The penalization shape parameter, a vector of length J. The penalty on the mixture proportions is Dirichlet(P*p). Nearly always (1,0,...,0)
maxIter Maximum number of iterations.
tol Convergence tolerance
nocheck If FALSE, mixFdr will fit an empirical null model and warn you if the fitted model is very different. Set TRUE if you know what you're doing and need things to be faster.

Details

This function estimates effect sizes, fdr's and FDR's. If you don't want to get into the details, run it with defaults (perhaps using theonull to fit a theoretical or empirical null). The object returned has the estimates, see "Value".

Here is a short explanation of the mixture model - for a full account, see the papers on the author's website. Suppose we have z[i]~Normal(delta[i], noiseSD). We want to estimate delta[i], P(delta[i]=0|z[i]), and P(delta[i]=0||z[i]|>z) (effect size, fdr, FDR). Given a prior for delta we can do this. The mixture model fits the prior based on the data, using a J group normal mixture model. A theoretical null means the first mixture component is a point mass at 0. An empirical null allows the first mixture component to vary. The mixture proportions are given a Dirichlet(P*p) penalty to stabilize them and to fit empirical nulls if required.

The upshot here is that no matter what the options, the density corresponding to the fitted model needs to fit the data well for any inference to be trustworthy. Look at the histogram plot produced by mixFdr. If the density doesn't fit well, particularly near the center, try using an empirical null (theonull=FALSE), increasing the number of groups, calibrating the penalization (or experimenting with it).

If you need to incorporate mixFdr's estimates into another procedure, mixFdr may be too slow. See mixModelFitter for the workhorse fitting function, which is much faster.

Value

A list of objects. They can be divided into the fitted model, and derived estimates. The fitted model parameters are for the marginals density NOT the prior on delta.

pi Mixture proportions
mu Mean of each mixture component
sigma Sd of each mixture component
noiseSD Fitted or supplied sampling SD
converged Did the EM converge?
nIter How many iterations
fdr local false discovery rate estimates for each case
FDRTwoSided Two sided FDR estimates for each case
FDRLeft Left-tail FDR estimates for each case
FDRRight Right-tail FDR estimates for each case
effectSize Effect size estimates for each case
effectPostVar Estimates of the posterior variance of the effect size for each case
call The call
data The supplied data

Author(s)

Omkar Muralidharan

References

See papers at the author's website http://stat.stanford.edu/~omkar

See Also

effectSize, mixModelFitter

Examples


## A simple workflow

z = c(rnorm(10000, 0.1, 1.2), rnorm(1000, 3, 1)) # null, alternative
m = mixFdr(z, theonull = TRUE)

# uh-oh, warning
# Just visually, the fit is not so good near the center
# Let's try an empirical null 

m = mixFdr(z)

# Better. But see how the effect size estimate changes with noiseSD. 

# We usually want:
m$fdr
m$effectSize
m$FDRTwoSided

# Suppose we have data with known noise level 1

 z = c(rnorm(1000), rnorm(100,3,1))
 m = mixFdr(z, J = 3, noiseSD = 1)
 delta = m$effectSize
 fdrs = m$fdr
 
# Suppose we wanted fdr and effect size curves
# then we could do

 s = seq(-5,5,by=0.01)
 effCurve = effectSize(s,m)[,1]
 fdrCurve = fdrMixModel(s,m,abs(m$mu - m$mu[1])<=0.2) # this tells it which indices to consider null

# compare to the Bayes estimator for this problem
 mTrue = list(pi = c(10,1)/11, mu = c(0,3), sig = c(1,1))
# note that the model parameters are in terms of the MARGINAL not of the prior
 trueEff = effectSize(s,mTrue)[,1]
 trueFdr = fdrMixModel(s,mTrue,c(TRUE,FALSE))
 par(mfrow = c(2,1))
 plot(s, trueEff, t = 'l', main = "Effect Size - Black is true", xlab = "z", ylab = "E(delta|z)", ylim = c(-3,3))
 lines(s, effCurve, col = 2)
 plot(s, trueFdr, t = 'l', main = "fdr - Black is true", xlab = "z", ylab = "fdr", ylim = c(0,1))
 lines(s, fdrCurve, col = 2)


[Package mixfdr version 1.0 Index]