simulateSNPglm {scrime} | R Documentation |
Simulates SNP data. Interactions of some of the simulated SNPs are then used to specify either a binary or a quantitative response by a logistic or linear regression model, respectively.
simulateSNPglm(n.obs = 1000, n.snp = 50, list.ia = NULL, list.snp = NULL, beta0 = -0.5, beta = 1.5, maf = 0.25, sample.y = TRUE, p.cutoff = 0.5, err.fun = NULL, rand = NA, ...)
n.obs |
number of observations that should be generated. |
n.snp |
number of SNPs that should be generated. |
list.ia |
a list consisting of numeric vectors (or values) specifying the genotypes
of the SNPs that should be explanatory for the response. Each of these vectors
must be composed of some of the numbers -3, -2, -1, 1, 2, 3, where 1 denotes
the homozygous reference genotype, 2 the heterozygous genotype, and 3 the
homozygous variant genotype, and a minus before these numbers means that
the corresponding SNP should be not of this genotype. If, e.g., one of the
vectors is given by c(1, -1, -3) and the corresponding vector in
list.snp is c(5, 7, 8) , then the corresponding interaction
used in the regression model to specify the response is
(SNP5 == 1) & (SNP7 != 1) & (SNP8 != 3) .
For more details, see Details. Must be specified if list.snp is specified.
If both list.ia and list.snp are NULL , then the interactions
shown in the Details section are used. |
list.snp |
a list consisting of numeric vectors specifying the SNPs that compose
the interactions used in the regression model. Each of these vectors must have the
same length as the corresponding vector in list.ia , and must consist
of integers between 1 and n.snp , where the integer i corresponds to
the ith column of the simulated SNP matrix. If list.ia is specified but
not list.snp , then the first n SNPs are used to generate the interactions,
where n is the total number of values in list.ia . For the case that both
list.ia and list.snp are not specified, see Details. |
beta0 |
a numeric value specifying the intercept of the regression model. |
beta |
a non-negative numeric value or vector of the same length as list.ia
(i.e. one numeric value for each interaction) specifying the parameters in the
regression model. |
maf |
either an integer, or a vector of length 2 or n.snp specifying the
minor allele frequency. If an integer, all the SNPs will have the same
minor allele frequency. If a vector of length n.snp , each SNP will have the minor
allele frequency specified in the corresponding entry of maf . If length 2, then
maf is interpreted as the range of the minor allele frequencies, and for each SNP,
a minor allele frequency will be randomly drawn from a uniform distribution with
the range given by maf . |
sample.y |
should the values of the response in the logistic regression model be randomly drawn
using the probabilities of the respective observations for being a case? If FALSE ,
then the response value of an observation is 1 if its probability for being a case is larger
than p.cutoff , and otherwise the observation is classified as a control (i.e. the value
of the response is 0). Ignored if err.fun is specified. |
p.cutoff |
a probability, i.e. a numeric value between 0 and 1, naming the cutoff for an
observation to be called a case if sample.y = FALSE . For details, see sample.y .
Ignored if sample.y = TRUE or err.fun is specified. |
err.fun |
a function for generating the error that is added to the linear model to determine
the value of the (quantitative) response. If NULL , a logistic regression model is fitted.
If specified, a linear model is fitted. Therefore, this argument is used to differ between the
two types of models. The specified function must have as first argument the number of values
that should be generated and as output a vector consisting of these values. Further arguments
can also be specified because of ... in simulateSNPglm .
If, e.g., err.fun = rnorm , then rnorm(n.obs) will be called to generate
n.obs observations from a standard normal function. |
rand |
a numeric value for setting the random number generator in a reproducible state. |
... |
further arguments of the function specified by err.fun . |
simulateSNPglm
first simulates a matrix consisting of n.obs
observations and n.snp
SNPs, where the minor allele frequencies of these SNPs are given by maf
.
Note that all SNPs are currently simulated independently of each other such that they are unlinked.
Afterwards, the response is determined by a regression model using the specifications of list.ia
,
list.snp
, beta0
and beta
. Depending on whether err.fun
is specified or not,
a linear or a logistic regression model is used, respectively, i.e. the response Y is continuous or binary.
By default, a logistic regression model
logit(Prob(Y = 1)) = beta0
+ beta[1]
* L1 + beta[2]
* L2 + ...
is fitted, since err.fun = NULL
.
If both list.ia
and list.snp
are NULL
, then interactions similar to the one
considered, e.g., in Nunkesser et al. (2007) or Schwender et al. (2007) are used, i.e.
L1 = (SNP6 != 1) & (SNP7 == 1)
and
L2 = (SNP3 == 1) & (SNP9 == 1) & (SNP10 == 1)
,
by setting list.ia = list(c(-1, 1), c(1, 1, 1))
and
list.snp = list(c(6, 7), c(3, 9, 10))
.
Using the above model Prob(Y = 1) is computed for each observation, and its value of the response is
determined either by a random draw from a Bernoulli distribution using this probability (if sample.y = TRUE
),
or by evaluating if Prob(Y = 1) > p.cutoff
(if sample.y = FALSE
).
If err.fun
is specified, then the linear model
Y = beta0
+ beta[1]
* L1 + beta[2]
* L2 + ... + error
is used to determine the values of the response Y, where the values for error are given
by the output of a call of err.fun
.
An object of class simSNPglm
consisting of
x |
a matrix with n.obs rows and n.snp columns containing the simulated SNP values. |
y |
a vector of length n.obs composed of the values of the response. |
beta0 |
the value of the intercept. |
beta |
the vector of parameters. |
ia |
a character vector naming the explanatory interactions. |
maf |
a vector of length n.snp composed of the minor allele frequencies. |
prob |
a vector of length n.obs consisting of the values of Prob(Y = 1) (if err.fun = NULL ). |
err |
a vector of length n.obs composed of the values of the error (if err.fun is specified). |
p.cutoff |
the value of p.cutoff (if err.fun = NULL and sample.y = FALSE ). |
err.call |
a character string naming the call of the error function (if err.fun is specified). |
Holger Schwender, holger.schwender@udo.edu
Nunkesser, R., Bernholt, T., Schwender, H., Ickstadt, K. and Wegener, I. (2007). Detecting High-Order Interactions of Single Nucleotide Polymorphisms Using Genetic Programming. Bioinformatics, 23, 3280-3288.
Schwender, H. (2007). Statistical Analysis of Genotype and Gene Expression Data. Dissertation, Department of Statistics, University of Dortmund.
simulateSNPs
, summary.simSNPglm
, simulateSNPcatResponse
## Not run: # The simulated data set described in Details. sim1 <- simulateSNPglm() sim1 # A bit more information: Table of probabilities of being a case # vs. numbers of cases and controls. summary(sim1) # Calling an observation a case if its probability of being # a case is larger than 0.5 (the default for p.cutoff). sim2 <- simulateSNPglm(sample.y = FALSE) summary(sim2) # If ((SNP4 != 2) & (SNP3 == 1)), (SNP5 ==3) and # ((SNP12 !=1) & (SNP9 == 3)) should be the three interactions # (or variables) that are explanatory for the response, # list.ia and list.snp are specified as follows. list.ia <- list(c(-2, 1), 3, c(-1,3)) list.snp <- list(c(4, 3), 5, c(12,9)) # The binary response and the data set consisting of # 600 observations and 25 SNPs, where the minor allele # frequency of each SNP is randomly drawn from a # uniform distribution with minimum 0.1 and maximum 0.4, # is then generated by sim3 <- simulateSNPglm(n.obs = 600, n.snp = 25, list.ia = list.ia, list.snp = list.snp, maf = c(0.1, 0.4)) sim3 summary(sim3) # If the response should be quantitative, err.fun has # to be specified. To use a normal distribution with mean 0 # (default in rnorm) and a standard deviation of 2 # as the distribution of the error, call simulateSNPglm(err.fun = rnorm, sd = 2) ## End(Not run)