simulateSNPs {scrime}R Documentation

Simulation of SNP data

Description

Simulates SNP data, where a specified proportion of cases and controls is explained by specified set of SNP interactions.

Usage

simulateSNPs(n.obs, n.snp, vec.ia, prop.explain = 1, 
  list.ia.val = NULL, vec.ia.num = NULL, maf = c(0.1, 0.4), 
  prob.val = rep(1/3, 3), list.equal = NULL, prob.equal = 0.8,
  rm.redundancy = TRUE, shuffle = FALSE, shuffle.obs = FALSE, rand = NA)

Arguments

n.obs either an integer specifying the total number of observations, or a vector of length 2 specifying the number of cases and the number of controls. If the former, then the number of both cases and controls is ceiling(n.obs/2).
n.snp integer specifying the number of SNPs.
vec.ia a vector of integers specifying the orders of the interactions that explain the cases. c(3,1,2,3), e.g., means that a three-way, a one-way (i.e. just a SNP), a two-way, and a three-way interaction explain the cases.
prop.explain either an integer or a vector of length(vec.ia) specifying the proportions of cases explained by the interactions of interest among all observation having the interaction of interest. Must be larger than 0.5. E.g., prop.explain = 1 means that only cases have the interactions of interest specified by vec.ia (and list.ia.val). E.g., vec.ia = c(3, 2) and prop.explain = c(1, 0.8) means that only cases have the three-way interaction of interest, while 80% of the observations having the two-way interaction of interest are cases, and 20% are controls.
list.ia.val a list of length(vec.ia) specifying the exact interactions. The objects in this list must be vectors of length vec.ia[i], and consist of the values 0 (for homozygous reference), 1 (heterozygous variant), or 2 (homozygous variant). E.g., vec.ia = c(3, 2) and list.ia.val = list(c(2, 0, 1), c(0, 2)) and prob.equal = 1 (see also list.equal) means that ((SNP1 == 2) & (SNP2 == 0) & (SNP3 == 1)) and ((SNP4 == 0) & (SNP5 == 2)) are the explanatory interactions (if additionally prob.equal = 1; see also list.equal). If NULL, the genotypes are randomly drawn using the probabilities given by prob.val.
vec.ia.num a vector of length(vec.ia) specifying the number of cases (not observations) explained by the interactions in vec.ia. If NULL, all the cases are divided into length(vec.ia) groups of about the same size. sum(vec.ia.num) must be smaller than or equal to the total number of cases. Each entry of vec.ia.num must currently be >= 10.
maf either an integer, or a vector of length 2 or n.snp specifying the minor allele frequencies. If an integer, all SNPs will have the same minor allele frequency. If a vector of length n.snp, each SNP will have the minor allele frequency specified in the corresponding entry of maf. If length 2, then maf is interpreted as the range of the minor allele frequencies, and for each SNP, a minor allele frequency will be randomly drawn from a uniform distribution with the range given by maf. Note: If a SNP belongs to an explanatory interaction, then only the set of observations not explained by this interaction will have the minor allele frequency specified by maf.
prob.val a vector consisting of the probabilities for drawing a 0, 1, or 2, if list.ia.val = NULL, i.e. if the genotypes of the SNPs explaining the case-control status should be randomly drawn. Ignored if list.ia.val is specified. By default, each genotype has the same probability of being drawn.
list.equal list of same structure as list.ia.val containing only ones and zeros, where a 1 specifies the equality to the corresponding value in list.ia.val, and a 0 specifies the non-equality. Thus, the entries of list.equal specify if the corresponding SNP should be of a particular genotype (when the entry is 1) or should be not of this genotype (when entry is 0). If NULL, this list will be generated automatically using prob.equal. If, e.g., vec.ia = c(3, 2), list.ia.val = list(c(2, 0, 1), c(0, 2)), and list.equal = list(c(1, -1, 1), c(1, -1)), then the explanatory interactions are given by ((SNP1 == 2) & (SNP2 != 0) & (SNP3 == 1)) and ((SNP4 == 0) & (SNP5 != 2))
prob.equal a numeric value specifying the probability that a 1 is drawn when generating list.equal. prob.equal is thus the probability for an equal sign.
rm.redundancy should redundant SNPs be removed from the explaining interactions? It is possible that one specify an explaining i-way interaction, but an interaction between (i-1) of the variables contained in the i-way interaction already explains all the cases (and controls) that the i-way interaction should explain. In this case, the redundant SNP is removed if rm.redundancy = TRUE.
shuffle logical. By default, the first sum(vec.ia) columns of the generated data set contain the explanatory SNPs in the same order as they appear in this data set. If TRUE, this order will be shuffled.
shuffle.obs should the observations be shuffled?
rand integer. Sets the random number generator in a reproducible state.

Value

An object of class simulatedSNPs composed of

data a matrix with n.obs rows and n.snp columns containing the SNP data.
cl a vector of length n.obs comprising the case-control status of the observations.
tab.explain a table naming the explanatory interactions and the numbers of cases and controls explained by them.
ia character vector naming the interactions.
maf vector of length n.snp containing the minor allele frequencies.

Note

Currently, the genotypes of all SNPs are simulated independently from each other (except for the SNPs that belong to the same explanatory interaction).

Author(s)

Holger Schwender holger.schwender@udo.edu

See Also

simulateSNPglm, simulateSNPcatResponse

Examples

## Not run: 
# Simulate a data set containing 2000 observations (1000 cases
# and 1000 controls) and 50 SNPs, where one three-way and two 
# two-way interactions are chosen randomly to be explanatory 
# for the case-control status.

sim1 <- simulateSNPs(2000, 50, c(3, 2, 2))
sim1

# Simulate data of 1200 cases and 800 controls for 50 SNPs, 
# where 90
# three-way interaction are cases, and 95
# showing a randomly chosen two-way interactions are cases.

sim2 <- simulateSNPs(c(1200, 800), 50, c(3, 2), 
   prop.explain = c(0.9, 0.95))
sim2

# Simulate a data set consisting of 1000 observations and 50 SNPs,
# where the minor allele frequency of each SNP is 0.25, and
# the interactions 
# ((SNP1 == 2) & (SNP2 != 0) & (SNP3 == 1))   and 
# ((SNP4 == 0) & (SNP5 != 2))
# are explanatory for 200 and 250 of the 500 cases, respectively,
# and for none of the 500 controls.

list1 <- list(c(2, 0, 1), c(0, 2))
list2 <- list(c(1, 0, 1), c(1, 0))
sim3 <- simulateSNPs(1000, 50, c(3, 2), list.ia.val = list1,
    list.equal = list2, vec.ia.num = c(200, 250), maf = 0.25)

## End(Not run)

[Package scrime version 1.1.2 Index]