vabayelMix {vabayelMix} | R Documentation |
Learns a gaussian mixture model from data using an optimal separable approximation to the posterior density. The optimisation uses a variational procedure and implements an iterative ensemble learning algorithm. The algorithm gives a framework in which to infer the number of clusters in the data set. Prior information may be incorporated through specification of hyperparameters in a prior distribution. Current version implements a gaussian mixture model where the covariances matrices are diagonal.
data |
A matrix of dimension Ns x Ndim containing the data to be clustered. Algorithm clusters rows of matrix and treats columns as dimensions. |
prior |
A list of various elements containing prior
information as obtained for example by using UseBasicPrior . List elements are prior$mean , prior$ivarm , prior$ivara ,
prior$ivarb and prior$dapi . The first four are matrices of dimension Ncat
x Ndim, prior$dapi is a vector of length Ncat. prior$mean contains the
means of the cluster mean gaussian priors. prior$ivarm
contains the inverse variances for the cluster mean gaussian priors. prior$ivara and
prior$ivarb contain the parameters for the gamma prior distribution of
the inverse variances of the clusters. prior$dapi is a weight vector
specifying prior knowledge about the number of clusters. If prior
is unspecified a complete uninformative prior is implemented that assumes
rows to be mean normalised to zero. |
Ncat |
The maximum number of clusters or categories to look for in the data set. Algorithm switches off clusters it doesn't need. See References. |
nruns |
Number of ensemble learning optimisation runs to be performed. Each optimisation run uses a different (random) starting point. |
npick |
The npick runs (out of nruns) that best optimise the cost function. See References. |
MaxIt |
Maximum number of iterations to be performed for a single optimisation run. |
conv.tol |
Threshold tolerance level for establishing convergence of iterations. |
nCV |
Number of consecutive iterations to consider in
establishing convergence of the run at level conv.tol . |
verbatim |
Logical. If true prints out estimates and cost function value per iteration. |
A list with the following components:
estvals |
A list with components:
|
wcl |
A matrix of dimension npick x Ns. Each row gives cluster assignment of each row of data. Clusters are labeled by integers. |
probs |
A list of length npick, each list element is a matrix of dimension Ns x Ncat containing the probabilities of membership to clusters. |
costs |
A vector of length nruns specifying converged values of cost function. |
conv |
A binary vector of length nruns specifying if that run converged (0) or not (1). |
Andrew Teschendorffaet21@hutchison-mrc.cam.ac.uk
NsTot <- 100; Nspg <- 50; Ng <- 2; deg.idx <- 1 ; data <- matrix( nrow=NsTot, ncol=Ng); for( s in 1:Nspg ){ data[s,] <- rnorm(Ng,0,0.25); } for( s in (Nspg+1):NsTot){ data[s,] <- rnorm(Ng,0,0.25); data[s,deg.idx] <- rnorm(1,2,0.25); } types.idx <- c(rep(1,50),rep(2,50)); useprior.l <- UseBasicPrior(data,rep(1,4)); vbmix <- vabayelMix(data, prior=NA, Ncat=4, nruns=10, npick=2,MaxIt=500, conv.tol=0.001, nCVconv=10); # or could use # vbmix <- vabayelMix(data, prior=useprior.l, Ncat=4, nruns=10, npick=2,MaxIt=500, conv.tol=0.001, nCVconv=10); plot(1:NsTot,vbmix$wcl[1,],type="h",col=types.idx);