Ford2 {exactmaxsel}R Documentation

Distribution of maximally selected statistics for (at least) ordinally scaled variables in the two-cutpoint context

Description

The function Ford2 computes the distribution of the maximally selected association criterion of interest (either the chi-square statistic or the Gini-gain in the current version) when Y is binary and X has ordered values, given n0, n1 and A, in the case of a non-monotonic association represented by two cutpoints.

Usage

Ford2(c, n0, n1, A, statistic)

Arguments

c the value at which the distribution function has to be computed.
n0 the number of observations in class Y=0.
n1 the number of observations in class Y=1.
A a vector of length K giving the number of observations with X=1,...,X=K.
statistic the association measure used as criterion to select the best split. Currently, only statistic="chi2" (chi-square statistic) and statistic="gini" (the Gini-gain from machine learning) are implemented.

Details

Suppose the response Y is binary (Y=0,1) and the predictor X has K ordered categorical values (X=1,...,K). The criterion is maximized over all the binary splittings of the set {1,...,K} that are obtained from at most two cutpoints. For example, with K=4, the criterion is maximized over the splittings {1,2,3}{4}, {1,2}{3,4}, {1}{2,3,4}, {1,2,4}{3}, {1,4}{2,3} and {1,3,4}{2}.

Value

the value of the distribution function at c.

Author(s)

Anne-Laure Boulesteix (http://www.slcmsr.net/boulesteix)

References

A.-L. Boulesteix and C. Strobl (2006), Maximally selected chi-square statistics and umbrella orderings, Computational Statistics and Data Analysis (in press).

See Also

Ford,Fcat, maxsel.

Examples

# load exactmaxsel library
library(exactmaxsel)

Ford2(c=4,n0=15,n1=15,A=c(6,10,9,5),statistic="chi2")
Ford2(c=0.02,n0=15,n1=15,A=c(5,8,7,10),statistic="gini")


[Package exactmaxsel version 1.0-2 Index]