Ford {exactmaxsel}R Documentation

Distribution of maximally selected statistics for (at least) ordinally scaled variables

Description

The function Ford computes the distribution of the maximally selected association criterion of interest (either the chi-square statistic or the Gini-gain in the current version) when Y is binary and X has ordered values, given n0, n1 and A. Note that X must be AT LEAST ordinally scaled, i.e. continuous variables are also allowed as an extreme special case.

Usage

Ford(c, n0, n1, A, statistic)

Arguments

c the value at which the distribution function has to be computed.
n0 the number of observations in class Y=0.
n1 the number of observations in class Y=1.
A a vector of length K giving the number of observations with X=1,...,X=K. In the special case of a continuous X variable taking distinct values in the available sample, A takes the form A=rep(1,N), where N=n0+n1.
statistic the association measure used as criterion to select the best split. Currently, only statistic="chi2" (chi-square statistic) and statistic="gini" (the Gini-gain from machine learning) are implemented.

Details

Suppose the response Y is binary (Y=0,1) and the predictor X has K ordered categorical values (X=1,...,K). The criterion is maximized over all the binary splittings of the set {1,...,K} that preserve the ordering. For K=3, the criterion is thus maximized over the splittings {1,2}{3} and {1}{2,3}. Note that X may also be a substantially continuous variable that is observed at a discrete scale and thus has ties.

Value

the value of the distribution function at c.

Author(s)

Anne-Laure Boulesteix (http://www.slcmsr.net/boulesteix)

References

A.-L. Boulesteix (2006), Maximally selected chi-square statistics for ordinal variables, Biometrical Journal 48:451-462.

See Also

Fcat, Ford2, maxsel.

Examples

# load exactmaxsel library
library(exactmaxsel)

Ford(c=4,n0=15,n1=10,A=c(6,10,9),statistic="chi2")
Ford(c=0.02,n0=15,n1=15,A=c(5,8,7,10),statistic="gini")


[Package exactmaxsel version 1.0-2 Index]