nmfEstimateRank {NMF}R Documentation

Estimate optimal rank for Nonnegative Matrix Factorization (NMF) models

Description

A critical parameter in NMF algorithms is the factorization rank r. It defines the number of basis effects used to approximate the target matrix. Function nmfEstimateRank helps in choosing an optimal rank by implementing simple approaches proposed in the litterature.

Usage

nmfEstimateRank(x, range, method = nmf.getOption("default.algorithm"), nrun = 30, conf.interval = FALSE, ...)
plot.NMF.rank(x, what = c("all", "cophenetic", "rss", "residuals", "dispersion"), ...)

Arguments

conf.interval a single logical specifying if confidence intervals should be estimated for all the computed consensus measures. For each rank in range, the confidence intervals are estimated by bootstrap, resampling 5nrun times with replacement across the nrun runs.
method A single NMF algorithm, in one of the format accepted by interface nmf.
nrun a numeric giving the number of run to perform for each value in range.
range a numeric vector containing the ranks of factorization to try.
what a character string that partially matches one of the following item: 'all', 'cophenetic', 'rss', 'residuals' , 'dispersion'. It specifies which measure must be plotted (what='all' plots all the measures).
x For nmfEstimateRank a target object to be estimated, in one of the format accepted by interface nmf.
For plot.NMF.rank an object of class NMF.rank as returned by function nmfEstimateRank.
... For nmfEstimateRank, these are extra parameters passed to interface nmf. Note that the same parameters are used for each value of the rank. See nmf.
For plot.NMF.rank, these are extra graphical parameter passed to the standard function plot. See plot.

Details

Given a NMF algorithm and the target matrix, a common way of estimating r is to try different values, compute some quality measures of the results, and choose the best value according to this quality criteria. See Brunet et al. (2004) and Hutchins et al. (2008).

The function nmfEstimateRank allow to launch this estimation procedure. It performs multiple NMF runs for a range of rank of factorization and, for each, returns a set of quality measures together with the associated consensus matrice.

Value

A S3 object (i.e. a list) of class NMF.rank with the following slots:

measures a data.frame containing the quality measures for each rank of factorizations in range. Each row correspond to a measure, each column to a rank.
consensus a list of consensus matrices, indexed by the rank of factorization (as a character string).

Author(s)

Renaud Gaujoux renaud@cbio.uct.ac.za

References

Metagenes and molecular pattern discovery using matrix factorization Brunet, J.~P., Tamayo, P., Golub, T.~R., and Mesirov, J.~P. (2004) Proc Natl Acad Sci U S A 101(12), 4164–4169.

See Also

nmf

Examples


set.seed(123456)
n <- 50; r <- 3; m <- 20
V <- syntheticNMF(n, r, m, noise=TRUE)

# Use a seed that will be set before each first run
## Not run: res.estimate <- nmfEstimateRank(V, seq(2,5), method='brunet', nrun=10, seed=123456)

# plot all the measures
## Not run: plot(res.estimate)
# or only one: e.g. the cophenetic correlation coefficient
## Not run: plot(res.estimate, 'cophenetic')


[Package NMF version 0.2.4 Index]