simulate {MAclinical}R Documentation

Simulating data

Description

This function simulates a list of data sets as described in Boulesteix et al (2008), section 3.1.

Usage

simuldata_list(niter=50,n=500,p=1000,psig=50,q=5,muX=0,muZ=0)
simuldatacluster_list(niter=50,n=500,p=1000,psig=50,q=5,muX=0,muZ=0)

Arguments

niter The number of data sets to be simulated.
n The number of observations.
p The number of microarray variables (genes).
psig The number of significant microarray variables (must be <p).
q The number of clinical variables.
muX The class mean difference for the psig relevant genes.
muZ The class mean difference for the q clinical variables.

Details

With the function simuldata_cluster, observations with y=1 are assumed to come from two different subgroups, 1a and 1b, each with probability 0.5. Relevant genes are generated such that they separate class 1a from the rest, whereas clinical variables separate class 1b from the rest.

Value

A niter-list of simulated data sets. Each data set is given as a list with three elements:

y the n-vector of class memberships, coded as 0,1.
x the n x p matrix of gene expressions levels. Each row corresponds to an observation, each column to a variable (gene).
z the n x q matrix of clinical variables. Each row corresponds to an observation, each column to a clinical variable.

Author(s)

Anne-Laure Boulesteix (http://www.slcmsr.net/boulesteix)

References

Boulesteix AL, Porzelius C, Daumer M, 2008. Microarray-based classification and clinical predictors: On combined classifiers and additional predictive value. Bioinformatics 24:1698-1706.

See Also

testclass, testclass_simul, plsrf_x_pv, plsrf_xz_pv, plsrf_x, plsrf_xz, logistic_z, rf_z, svm_x.

Examples

# load MAclinical library
# library(MAclinical)

# Generating 3 simulated data sets
my.data<-simuldata_list(niter=3,n=100,p=150,psig=10,q=5,muX=2,muZ=1)
length(my.data)
dim(my.data[[1]]$x)
dim(my.data[[1]]$z)
length(my.data[[1]]$y)


[Package MAclinical version 1.0-2 Index]