rel.risk.cint {corpora} | R Documentation |
This function approximates a conservative confidence interval for the relative risk coefficient, i.e. the ratio r = p_1/p_2 between two population proportions, based on frequency counts from two corpora. The approximation is computed from individual confidence intervals for the two proportions, with confidence levels adjusted accordingly.
rel.risk.cint(k1, n1, k2, n2, conf.level = 0.95, alternative = c("two.sided", "less", "greater"), method = c("binomial", "z.score"), correct = TRUE)
k1 |
frequency of a type in the first corpus (or an integer vector of type frequencies) |
n1 |
the sample size of the first corpus (or an integer vector specifying the sizes of different samples) |
k2 |
frequency of the type in the second corpus (or an integer
vector of type frequencies, in parallel to k1 ) |
n2 |
the sample size of the second corpus (or an integer vector
specifying the sizes of different samples, in parallel to
n1 ) |
conf.level |
the desired confidence level (defaults to 95%) |
alternative |
a character string specifying the alternative
hypothesis, yielding a two-sided (two.sided , default), lower
one-sided (less ) or upper one-sided (greater )
confidence interval |
method |
a character string specifying whether the individual
confidence intervals for the two proportions are based on the
binomial test (binomial ) or the z-score test
(z.score ) |
correct |
if TRUE , apply Yates' continuity correction for
the z-score test (default) |
This function computes individual confidence intervals for the two population proportions p_1 (from k_1 and n_1) and p_2 (from k_2 and n_2). Then, a confidence interval for the relative risk ratio r = p_1 / p_2 is determined in such a way, that r lies within the interval whenever p_1 and p_2 lie in their respective confidence intervals.
Thus, when these intervals are computed with a confidence level of
e.g. .975, r is certain to fall within its confidence
interval in .975^2 = .95 of all cases. This adjustment of
confidence levels is made automatically. Note that r
might fall within its confidence interval even when either
p_1 or p_2 is outside the respective interval, hence
rel.risk.cint
computes a conservative confidence
interval that will be larger than necessary.
Exact confidence intervals for the odds ratio coefficient
theta = (p_1 / (1-p_1)) / (p_2 / (1-p_2)) can be computed with
the fisher.test
function. However, these exact
intervals are computationally very expensive and may cause R
to run out of memory for large frequency counts. In addition,
fisher.test
only computes a single confidence interval for each
function call (i.e., it cannot be applied to vectorised data).
A data frame with two columns, labelled lower
for the lower
boundary and upper
for the upper boundary of the confidence
interval. The number of rows is determined by the length of the
longest input vector (k1
, n1
, k2
, n2
and
conf.level
).
Stefan Evert
prop.cint
, chisq.pval
,
fisher.pval
, fisher.test