Ford {exactmaxsel} | R Documentation |
The function Ford
computes the distribution of the maximally selected
association criterion of interest (either the chi-square statistic or the
Gini-gain in the current version) when Y is binary and X has ordered
values, given n0
, n1
and A
. Note that X must be AT LEAST
ordinally scaled, i.e. continuous variables are also allowed as an extreme
special case.
Ford(c, n0, n1, A, statistic)
c |
the value at which the distribution function has to be computed. |
n0 |
the number of observations in class Y=0. |
n1 |
the number of observations in class Y=1. |
A |
a vector of length K giving the number of observations with
X=1,...,X=K. In the special case of a continuous X variable taking distinct
values in the available sample, A takes the form A=rep(1,N) ,
where N=n0+n1 . |
statistic |
the association measure used as criterion to select the
best split. Currently, only statistic="chi2" (chi-square statistic)
and statistic="gini" (the Gini-gain from machine learning) are
implemented. |
Suppose the response Y is binary (Y=0,1) and the predictor X has K ordered categorical values (X=1,...,K). The criterion is maximized over all the binary splittings of the set {1,...,K} that preserve the ordering. For K=3, the criterion is thus maximized over the splittings {1,2}{3} and {1}{2,3}. Note that X may also be a substantially continuous variable that is observed at a discrete scale and thus has ties.
the value of the distribution function at c
.
Anne-Laure Boulesteix (http://www.slcmsr.net/boulesteix)
A.-L. Boulesteix (2006), Maximally selected chi-square statistics for ordinal variables, Biometrical Journal 48:451-462.
# load exactmaxsel library library(exactmaxsel) Ford(c=4,n0=15,n1=10,A=c(6,10,9),statistic="chi2") Ford(c=0.02,n0=15,n1=15,A=c(5,8,7,10),statistic="gini")