generate.data {Oncotree}R Documentation

Generate random data from an oncogenetic tree

Description

Generates random event occurrence data based on an oncogenetic tree model.

Usage

generate.data(N, otree, with.errors=TRUE,
          edge.weights=if (with.errors) "estimated" else "observed")

Arguments

N The required sample size.
otree An object of the class oncotree.
with.errors A logical value specifying whether false positive and negative errors should be applied.
edge.weights A choice of whether the observed or estimated edge transition probabilities should be used in the calculation of probabilities. See oncotree.fit for explanation of the difference. By default, estimated edge transition probabilies if with.errors=TRUE and the observed ones if with.errors=FALSE.

Details

Technically, the distribution generated by the tree is calculated exactly (using distribution.oncotree), and the observations are generated by sampling this distribution. Thus if N is small and with.errors=TRUE, it might be faster to avoid the computational overhead of calculating the entire distribution, but rather generate data not including false positive/negatives and then randomly ‘corrupt’ it (see Examples below).

Value

A data set where each row is an independent observation.

Author(s)

Aniko Szabo

See Also

oncotree.fit

Examples

   data(ov.cgh)
   ov.tree <- oncotree.fit(ov.cgh)
   
   set.seed(7365)
   rd <- generate.data(200, ov.tree, with.errors=TRUE)
   
   #corrupt data - useful for small N
   system.time({
      rd2 <- generate.data(20, ov.tree, with.errors=FALSE);
      epos <- ov.tree$eps[["epos"]];
      eneg <- ov.tree$eps[["eneg"]];
      corrupt.data <- matrix(rbinom(prod(dim(rd2)),size=1,p=ifelse(rd2==0,epos,1-eneg)),
                          nr=nrow(rd2), nc=ncol(rd2), 
                          dimnames=list(NULL, names(rd2)))
     })
   system.time(generate.data(20, ov.tree, with.errors=TRUE))


[Package Oncotree version 0.3 Index]