Ford2 {exactmaxsel} | R Documentation |
The function Ford2
computes the distribution of the maximally selected
association criterion of interest (either the chi-square statistic or the
Gini-gain in the current version) when Y is binary and X has ordered
values, given n0
, n1
and A
, in the case of a non-monotonic
association represented by two cutpoints.
Ford2(c, n0, n1, A, statistic)
c |
the value at which the distribution function has to be computed. |
n0 |
the number of observations in class Y=0. |
n1 |
the number of observations in class Y=1. |
A |
a vector of length K giving the number of observations with X=1,...,X=K. |
statistic |
the association measure used as criterion to select the
best split. Currently, only statistic="chi2" (chi-square statistic)
and statistic="gini" (the Gini-gain from machine learning) are
implemented. |
Suppose the response Y is binary (Y=0,1) and the predictor X has K ordered categorical values (X=1,...,K). The criterion is maximized over all the binary splittings of the set {1,...,K} that are obtained from at most two cutpoints. For example, with K=4, the criterion is maximized over the splittings {1,2,3}{4}, {1,2}{3,4}, {1}{2,3,4}, {1,2,4}{3}, {1,4}{2,3} and {1,3,4}{2}.
the value of the distribution function at c
.
Anne-Laure Boulesteix (http://www.slcmsr.net/boulesteix)
A.-L. Boulesteix and C. Strobl (2006), Maximally selected chi-square statistics and umbrella orderings, Computational Statistics and Data Analysis (in press).
# load exactmaxsel library library(exactmaxsel) Ford2(c=4,n0=15,n1=15,A=c(6,10,9,5),statistic="chi2") Ford2(c=0.02,n0=15,n1=15,A=c(5,8,7,10),statistic="gini")