RJaCGH {RJaCGH}R Documentation

Reversible Jump MCMC for the analysis of arrays of CGH

Description

This function fits a non-homogeneous hidden Markov model to CGH data through bayesian methods and Reversible Jump Markov chain Montecarlo.

Usage

RJaCGH(y, Chrom = NULL, Pos = NULL, Dist=NULL, model = "genome", burnin = 10000, TOT =
10000, k.max = 6, stat = NULL, mu.alfa = NULL, mu.beta = NULL, ka = NULL,
g = NULL, prob.k = NULL, jump.parameters=list(), start.k = NULL,
RJ=TRUE, auto.label=0.60)

Arguments

y Vector with Log Ratio observations.
Chrom Vector with Chromosome indicator. Must be numeric (1 to 23)
Pos Vector with Positions of every gene. They can be absolute to the genome or relative to the chromosome. They should be ordered within every chromosome. This is, the arrays must be ordered by their positions in the genome. They must be integers.
Dist Optional vector of distances between genes. It should be a vector of length length(y)-1. Note that when Chrom is not NULL, every last value of every Chromosome is not used.
model if model="genome", the same model is fitted for the whole genome. If model="Chrom", a different model is fitted for each chromosome.
burnin Number of burn-in iterations in the Markov Chain
TOT Number of iterations after the burn-in
k.max Maximum number of hidden states to fit.
stat Initial Distribution for the hidden states. Must be a vector of size 1 + 2 + ... +k.max. If NULL, it is assumed a uniform distribution for every model.
mu.alfa Hyperparameter. See details
mu.beta Hyperparameter. See details
ka Hyperparameter. See details
g Hyperparameter. See details
prob.k Hyperparameter. See details
jump.parameters List with the parameters for the MCMC jumps. See details.
start.k Initial number of states. if NULL, a random draw from prob.k is chosen.
RJ Logical. If TRUE, Reversible Jump is performed. If not, MCMC over a fixed number of hidden states.
auto.label If not NULL, should be the minimum proportion of observations labeled as 'Normal'. See details.

Details

RJaCGH fits the following bayesian model: There is a priori distribution for the number of hidden states (different copy numbers) as stated by prob.k. If NULL, a uniform distribution between 1 and k.max is used.

The hidden states follow a normal distribution which mean (mu) follows itself a normal distribution with mean mu.alfa and stdev mu.beta. If NULL, these are the median of the data and the range. The variance (sigma.2)of the hidden states follows an Inverse Gamma distribution with parameters ka and g. If NULL, these are 2 and diff(range(y))^2 / 50, respectively.

The model for the transition matrix is based on a random matrix beta whose diagonal is zero. The transition matrix, Q, has the form: Q[i,j] = exp(-beta[i,j] + beta[i,j]*x) / sum(i,.) {exp(-beta[i,.] + beta[i,.]*x}

The prior distribution for beta is gamma with parameters 1, 1. The x are the distances between positions, normalized to lay between zero and 1 (x=diff(Pos) / max(diff(Pos)))

RJaCGH performs Markov Chain MonteCarlo with Reversible Jump to sample for the posterior distribution. Every sweep has 3 steps:

1.- A Metropolis-Hastings move is used to update, for a fixed number of hidden states, mu, sigma.2 and beta. A symmetric proposal with a normal distribution and standard deviation sigma.tau.mu, sigma.tau.sigma.2 and sigma.tau.beta is sampled.

2.- A transdimensional move is chosen, between birth (a new hidden state is sampled from the prior) or death (an existing hidden state is erased).

3.- Another transdimensional move is performed; an split move (divide an existing state in two) or a combine move (join two adjacent states). The length of the split is sampled from a normal distribution with standard deviation tau.split.mu for the mu and tau.split.beta for beta.

jump.parameters must be a list with the parameters for the moves. It must have components sigma.tau.mu, sigma.tau.sigma.2, sigma.tau.beta These are vectors of length k.max. tau.split.mu, tau.split.beta are vectors of length 1. If any of them is NULL, a call to the internal function get.jump() is made to find 'good' values.

A relabelling of hidden states is performed to label biological states. The states whom have zero value inside a 95% confidence based on a normal distribution with means the median of mu and sd the median of the square root fo sigma.2 are labelled as 'Normal'. If no state is labelled as 'Normal', the closest state to zero is chosen. If auto.label is not null, closest states to 'Normal' are also labelled as 'Normal' until a proportion of auto.label is reached. Please note taht the default value is 0.60, so at least the 60% of the observations will be labelled as 'Normal'.

Value

The object returned follows a hierarchy:
If y is a matrix or data.frame (i.e., several arrays), an object of class RJaCGH.array is returned, with components:

[[]] A list with an object of corresponding class (see below) for every array.
array.names Vector with the names of the arrays.
[[]] a list with as many objects as k.max, with the fits.
k sequence of number of hidden states sampled.
prob.b Number of birth moves performed (Includes burn-in.
prob.d Number of death moves performed (Includes burn-in.
prob.s Number of split moves performed (Includes burn-in.
prob.c Number of combine moves performed (Includes burn-in.
y y vector.
Pos Pos vector.
model model.
Chrom Chromosome vector.
x x vector of distances between genes.
[[]] a list with as many components as chromosomes, of class RJaCGH (See below).
Pos Pos vector.
model model.
Chrom Chromosome vector.
mu a matrix with the means sampled
sigma.2 a matrix with the variances sampled
beta an array of dimension 3 with beta values sampled
stat vector of initial distribution
loglik log likelihoods of every MCMC iteration
prob.mu probability of aceptance of mu in the Metropolis-Hastings step.
prob.sigma.2 probability of aceptance of sigma.2 in the Metropolis-Hastings step.
prob.beta probability of aceptance of beta in the Metropolis-Hastings step.
state.labels Labels of the biological states.
prob.states Marginal posterior probabilities of belonging to every hidden state.
prob.joint.states Joint posterior probabilities for every pair of adjacent genes of both Gains, Normal or Losses.

Author(s)

Oscar M. Rueda and Ramon Diaz Uriarte

References

Oscar M. Rueda and Ramon Diaz Uriarte. A flexible, accurate and extensible statistical method for detecting genomic copy-number changes. http://biostats.bepress.com/cobra/ps/art9/. {http://biostats.bepress.com/cobra/ps/art9/}.

Cappe, Moulines and Ryden, 2005. Inference in Hidden Markov Models. Springer.

Green, P.J. (1995) Reversible Jump Markov Chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711-732.

See Also

summary.RJaCGH, states, model.averaging, plot.RJaCGH, trace.plot, gelman.brooks.plot, collapseChain

Examples

y <- c(rnorm(100, 0, 1), rnorm(10, -3, 1), rnorm(20, 3, 1),
       rnorm(100,0, 1)) 
Pos <- sample(x=1:500, size=230, replace=TRUE)
Pos <- cumsum(Pos)
Chrom <- rep(1:23, rep(10, 23))

jp <- list(sigma.tau.mu=rep(0.05, 4), sigma.tau.sigma.2=rep(0.03, 4),
           sigma.tau.beta=rep(0.07, 4), tau.split.mu=0.1, tau.split.beta=0.1)

fit.chrom <- RJaCGH(y=y, Pos=Pos, Chrom=Chrom, model="Chrom",
                    burnin=10, TOT=1000, k.max = 4,
                    jump.parameters=jp)
##RJ results for chromosome 5
table(fit.chrom[[5]]$k)
fit.genome <- RJaCGH(y=y, Pos=Pos, Chrom=Chrom, model="genome",
burnin=100, TOT=1000, jump.parameters=jp, k.max = 4)
## Results for the model with 3 states:
apply(fit.genome[[3]]$mu, 2, summary)
apply(fit.genome[[3]]$sigma.2, 2, summary)
apply(fit.genome[[3]]$beta, c(1,2), summary)

[Package RJaCGH version 1.0.2 Index]