cov.shrink {corpcor} | R Documentation |
The functions var.shrink
, cor.shrink
, and cov.shrink
compute
shrinkage estimates of variance, correlation, and covariance, respectively.
var.shrink(x, lambda.var, w, protect=0, verbose=TRUE) cor.shrink(x, lambda, w, protect=0, verbose=TRUE) cov.shrink(x, lambda, lambda.var, w, protect=0, verbose=TRUE)
x |
a data matrix |
lambda |
the correlation shrinkage intensity (range 0-1).
If lambda is not specified (the default) it is estimated
using an analytic formula from Schaefer and Strimmer (2005)
- see details below.
For lambda=0 the empirical correlations are recovered. |
lambda.var |
the variance shrinkage intensity (range 0-1).
If lambda.var is not specified (the default) it is estimated
using an analytic formula from Opgen-Rhein and Strimmer (2006)
- see details below.
For lambda.var=0 the empirical variances are recovered. |
w |
optional: weights for each data point - if not specified uniform weights are assumed
(w = rep(1/n, n) with n = nrow(x) ). |
protect |
the fraction of correlation components protected against excessive individual component risk (default: 0, no limited translation) |
verbose |
output some status messages while computing (default: TRUE) |
var.shrink
computes the empirical variance of each considered random variable,
and shrinks them towards their median. The shrinkage intensity is estimated using
Eq. 6 from Schaefer and Strimmer (2005), which leads to
λ_{var}^{*} = ( sum_{k=1}^p Var(s_{kk}) )/ sum_{k=1}^p (s_{kk} - median{s})^2
where median{s} denotes the median of the empirical variances.
Similarly cor.shrink
computes a shrinkage estimate of the correlation matrix by
shrinking the empirical correlations towards the identity matrix.
In this case the shrinkage intensity is
λ^{*} = sum_{k neq k} Var(r_{kl}) / sum_{k neq l} r_{kl}^2 .
cov.shrink
computes the corresponding full covariance matrix
on the basis of the shrunken correlation matrix and the shrunken variances.
In comparison with the standard empirical estimates
(var
, cov
, and cor
) the shrinkage estimates exhibit
a number of favorable properties. For instance,
These properties also carry over to derived quantities, such as partial variances and
partial correlations (pvar.shrink
and pcor.shrink
).
As an extra benefit, the shrinkage estimators have a form that can be very efficiently inverted
using the Woodbury matrix identity, especially if the number of variables is large
and the sample size is small.
Note that this identity is employed in the functions
invcov.shrink
and invcor.shrink
. and is much faster than directly
inverting the matrix output by cov.shrink
and cor.shrink
, respectively.
var.shrink
returns a vector with estimated variances.
cov.shrink
returns a covariance matrix.
cor.shrink
returns the corresponding correlation matrix.
Juliane Schaefer (http://www.stat.math.ethz.ch/~schaefer/), Rainer Opgen-Rhein (http://opgen-rhein.de), and Korbinian Strimmer (http://strimmerlab.org).
Opgen-Rhein, R., and K. Strimmer. 2006. Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach. Preprint. (http://strimmerlab.org/publications/shrinkt2006.pdf)
Schaefer, J., and K. Strimmer. 2005. A shrinkage approach to large-scale covariance estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol.4:32. (http://www.bepress.com/sagmb/vol4/iss1/art32/)
invcov.shrink
, pcor.shrink
, cor2pcor
# load corpcor library library("corpcor") # small n, large p p <- 100 n <- 20 # generate random pxp covariance matrix sigma <- matrix(rnorm(p*p),ncol=p) sigma <- crossprod(sigma)+ diag(rep(0.1, p)) # simulate multinormal data of sample size n sigsvd <- svd(sigma) Y <- t(sigsvd$v %*% (t(sigsvd$u) * sqrt(sigsvd$d))) X <- matrix(rnorm(n * ncol(sigma)), nrow = n) %*% Y # estimate covariance matrix s1 <- cov(X) s2 <- cov.shrink(X, protect=0) s3 <- cov.shrink(X, protect=0.05) # squared error sum((s1-sigma)^2) sum((s2-sigma)^2) sum((s3-sigma)^2) # compare positive definiteness is.positive.definite(sigma) is.positive.definite(s1) is.positive.definite(s2) is.positive.definite(s3) # compare ranks and condition rank.condition(sigma) rank.condition(s1) rank.condition(s2) rank.condition(s3) # compare eigenvalues e0 <- eigen(sigma, symmetric=TRUE)$values e1 <- eigen(s1, symmetric=TRUE)$values e2 <- eigen(s2, symmetric=TRUE)$values e3 <- eigen(s3, symmetric=TRUE)$values m <-max(e0, e1, e2, e3) yl <- c(0, m) par(mfrow=c(2,2)) plot(e1, main="empirical") plot(e2, ylim=yl, main="full shrinkage") plot(e3, ylim=yl, main="protected shrinkage") plot(e0, ylim=yl, main="true") par(mfrow=c(1,1))