RJaCGH {RJaCGH}R Documentation

Reversible Jump MCMC for the analysis of arrays of CGH

Description

This function fits a non-homogeneous hidden Markov model to CGH data through bayesian methods and Reversible Jump Markov chain Montecarlo.

Usage

RJaCGH(y, Chrom = NULL, Start=NULL, End=NULL, Pos = NULL,
       Dist=NULL, probe.names=NULL, maxVar=NULL, model = "Genome",
       var.equal=TRUE, max.dist=NULL, normal.reference=0,
       window = NULL, burnin = 10000, TOT =10000,
       k.max = 6, stat = NULL, mu.alfa = NULL, mu.beta = NULL,
       s1=NULL, s2=NULL, init.mu=NULL, init.sigma.2=NULL, init.beta=NULL,
       prob.k = NULL, jump.parameters=list(),
       start.k = NULL, RJ=TRUE, NC = 1, deltaT = 2, delete_gzipped =
                 .__DELETE_GZIPPED, singleState = FALSE)

Arguments

y Vector with Log Ratio observations.
Chrom Vector with Chromosome indicator.
Start Vector with start positions of the probes.
End Vector with end positions of the probes.
Pos Vector with Positions of every gene. They can be absolute to the genome or relative to the chromosome. They should be ordered within every chromosome. This is, the arrays must be ordered by their positions in the genome. They must be integers. Positions can be specified with Start, End or Pos, but only in one of the two ways.
Dist Optional vector of distances between genes. It should be a vector of length length(y)-1. Note that when Chrom is not NULL, every last value of every Chromosome is not used.
probe.names Character vector with the number of the probes.
maxVar Maximum value for the variance of the states. If NULL, the range of the data is chosen.
model if model="Genome", the same model is fitted for the whole genome. If model="Chrom", a different model is fitted for each chromosome.
var.equal Logical. If TRUE the variances of the hidden states are restricted to be the same.
max.dist maximal distance between spots. When two spots have a distance between them as far or further than max.dist, they are considered independent. That is, the state of that spot does not affect the state of the other. If NULL (the default) the maximum Dist or maximum difference in Pos is taken.
normal.reference The value considered as the mean of the normal state. See details. By default is 0.
window Multiplier of the standard deviation of the data to determine the width of the normal state. See details. Default (window = NULL) is 1.
burnin Number of burn-in iterations in the Markov Chain
TOT Number of iterations after the burn-in
k.max Maximum number of hidden states to fit.
stat Initial Distribution for the hidden states. Must be a vector of size 1 + 2 + ... +k.max. If NULL, it is assumed a uniform distribution for every model.
mu.alfa Hyperparameter. See details
mu.beta Hyperparameter. See details
prob.k Hyperparameter. See details
jump.parameters List with the parameters for the MCMC jumps. See details.
start.k Initial number of states. if NULL, a random draw from prob.k is chosen.
RJ Logical. If TRUE, Reversible Jump is performed. If not, MCMC over a fixed number of hidden states. Note that if FALSE, most of the methods for extracting information won't work.
s1 Standard deviation for the creation of a new mean in the birth move (first attempt). If NULL, it is the prior mu.beta.
s2 Standard deviation for the creation of a new mean in the birth move (second attempt). If NULL, it is the prior mu.beta.
init.mu Starting values for mu for the chain. See details
init.sigma.2 Starting values for sigma.2 for the chain. See details
init.beta Starting values for beta for the chain. See details
NC Number of coupled parallel chains. See details
deltaT Heat parameter for tempering the parallel chains. See details
delete_gzipped Should temporal files with viterbi paths be deleted?
singleState Logical. If TRUE, each hidden state is classified into a state of copy number alteration. If FALSE, a probability is assigned to each hidden state of being gained and lost. See relabelStates

Details

RJaCGH fits the following bayesian model: There is a priori distribution for the number of hidden states (different copy numbers) as stated by prob.k. If NULL, a uniform distribution between 1 and k.max is used.

The hidden states follow a normal distribution which mean (mu) follows itself a normal distribution with mean mu.alfa and stdev mu.beta. If NULL, these are the median of the data and the range. The square root of the variance (sigma.2)of the hidden states follows a uniform distribution between $0$ and maxVar.

The model for the transition matrix is based on a random matrix beta whose diagonal is zero. The transition matrix, Q, has the form: Q[i,j] = exp(-beta[i,j] + beta[i,j]*x) / sum(i,.) exp(-beta[i,.] + beta[i,.]*x

The prior distribution for beta is gamma with parameters 1, 1. The x are the distances between positions, normalized to lay between zero and 1 (x=diff(Pos) / max(diff(Pos)))

RJaCGH performs Markov Chain MonteCarlo with Reversible Jump to sample for the posterior distribution. NC sets the number of chains from we will sample in parallel. Each of them is tempered in order to escape from local maximum. The temper parameter is deltaT, and it can be a value greater than $0$. each chain is tempered according to:

$ 1 / (1 + deltaT * NC) $

Every sweep is performed for all chains and has 3 steps plus another one common for all of them:

1.- A Metropolis-Hastings move is used to update, for a fixed number of hidden states, mu, sigma.2 and beta. A symmetric proposal with a normal distribution and standard deviation sigma.tau.mu, sigma.tau.sigma.2 and sigma.tau.beta is sampled.

2.- A transdimensional move is chosen, between birth (a new hidden state is sampled from the prior) or death (an existing hidden state is erased). Both moves are tried using delayed rejection. That is, if the move is rejected, is given another try. The means for the new state are drawn for the priors, but the standard deviation can be set for the two stages with parameters s1 and s2.

3.- Another transdimensional move is performed; an split move (divide an existing state in two) or a combine move (join two adjacent states). The length of the split is sampled from a normal distribution with standard deviation tau.split.mu for the mu and tau.split.beta for beta.

4.- If NC is greater than 1, a swap move is tried to exchange information from two of the coupled parallel chains.

jump.parameters must be a list with the parameters for the moves. It must have components sigma.tau.mu, sigma.tau.sigma.2, sigma.tau.beta These are vectors of length k.max. tau.split.mu, tau.split.beta are vectors of length 1. If any of them is NULL, a call to the internal function get.jump() is made to find 'good' values.

A relabelling of hidden states is performed to match biological states. See details in relabelStates.

The initial values of the chain are drawn from an overdispersed distribution. One can start the chain in a given point with the parameters start.k (model to start from $1, ..., max.k$) and the initial values init.mu, init.sigma.2 (vectors of dimension start.k) and init.beta (matrix of positive values with start.k) rows and start.k) columns. The diagonal must be zero.

Value

The object returned follows a hierarchy:
If y is a matrix or data.frame (i.e., several arrays), an object of class RJaCGH.array is returned, with components:

[[]] A list with an object of corresponding class (see below) for every array.
array.names Vector with the names of the arrays.
[[]] a list with as many objects as k.max, with the fits.
k sequence of number of hidden states sampled.
prob.b Number of birth moves performed in first and second proposal.(Includes burn-in.
prob.d Number of death moves performed in first and second proposal.(Includes burn-in.
prob.s Number of split moves performed (Includes burn-in.
prob.c Number of combine moves performed (Includes burn-in.
prob.e Number of exchange (swap) moves performed between tempered chains and with cool (main) chain.
y y vector.
Pos Pos vector.
model model.
Chrom Chromosome vector.
x x vector of distances between genes.
viterbi A list with as many components as chromosomes. For each chromosome, a list, with at least three components: gzipped_sequence: the compacted and gzipped sequence of states; num_sequences: the number of sequences in that compacted set of sequences; sum_mcmc_iter: the number of times a viterbi sequence was obtained. All these are only of use for calls to the pREC funcitons (pREC_A and pREC_S).
[[]] a list with as many components as chromosomes, of class RJaCGH (See below).
Pos Pos vector.
Start Start positions.
End End positions.
probe.names Names of the probes.
model model.
Chrom Chromosome vector.
viterbi Identical structure as above.
mu a matrix with the means sampled
sigma.2 a matrix with the variances sampled
beta an array of dimension 3 with beta values sampled
stat vector of initial distribution
loglik log likelihoods of every MCMC iteration
prob.mu probability of aceptance of mu in the Metropolis-Hastings step.
prob.sigma.2 probability of aceptance of sigma.2 in the Metropolis-Hastings step.
prob.beta probability of aceptance of beta in the Metropolis-Hastings step.
state.labels Labels of the biological states.
prob.states Marginal posterior probabilities of belonging to every hidden state.

The number of rows of components mu, sigma.2 and beta is random, because it depends on the number of times a particular model is visited and on the number of moves between models, because when we visit a new model we also explore the space of its means, variances and parameters of its transition functions.

Note

The data must be ordered by chromosome and within chromosome by position.

Author(s)

Oscar M. Rueda and Ramon Diaz Uriarte

References

Rueda OM, Diaz-Uriarte R. Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH. PLoS Comput Biol. 2007;3(6):e122

Cappe, Moulines and Ryden, 2005. Inference in Hidden Markov Models. Springer.

Green, P.J. (1995) Reversible Jump Markov Chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711-732.

Green, P.J. and Antonietta, M. (2001) Delayed Rejection in Reversible Jump Metropolis Hastings. Biometrika, 88 (4), 1035-1053.

Geyer, C. J. (1991). Markov Chain Monte Carlo Maximum Likelihood. Proceedings of the 23th Symposium on the Interface, 156-163.

See Also

summary.RJaCGH, states, modelAveraging, plot.RJaCGH, trace.plot, relabelStates, pREC_A, pREC_S

Examples

y <- c(rnorm(100, 0, 1), rnorm(10, -3, 1), rnorm(20, 3, 1),
       rnorm(100,0, 1)) 
Pos <- sample(x=1:500, size=230, replace=TRUE)
Pos <- cumsum(Pos)
Chrom <- rep(1:23, rep(10, 23))

jp <- list(sigma.tau.mu=rep(0.05, 4), sigma.tau.sigma.2=rep(0.03, 4),
           sigma.tau.beta=rep(0.07, 4), tau.split.mu=0.1, tau.split.beta=0.1)

fit.chrom <- RJaCGH(y=y, Pos=Pos, Chrom=Chrom, model="Chrom",
                    burnin=10, TOT=1000, k.max = 4,
                    jump.parameters=jp)
##RJ results for chromosome 5
table(fit.chrom[["array1"]][[5]]$k)
fit.genome <- RJaCGH(y=y, Pos=Pos, Chrom=Chrom, model="Genome",
burnin=100, TOT=1000, jump.parameters=jp, k.max = 4)
## Results for the model with 3 states:
summary(fit.genome)

[Package RJaCGH version 2.0.0 Index]