Fcat {exactmaxsel}R Documentation

Distribution of maximally selected statistics for multicategorical variables

Description

The function Fcat computes the distribution of the maximally selected association criterion of interest (either the chi-square statistic or the Gini-gain in the current version) when Y is binary and X has unordered categorical values, given n0, n1 and A.

Usage

Fcat(c, n0, n1, A, statistic)

Arguments

c the value at which the distribution function has to be computed.
n0 the number of observations in class Y=0.
n1 the number of observations in class Y=1.
A a vector of length K giving the number of observations with X=1,...,X=K.
statistic the association measure used as criterion to select the best split. Currently, only statistic="chi2" (chi-square statistic) and statistic="gini" (the Gini-gain from machine learning) are implemented.

Details

Suppose the response Y is binary (Y=0,1) and the predictor X has K unordered categorical values (X=1,...,K). The criterion is maximized over all the binary splittings of the set {1,...,K}. For example, if K=4, the criterion is thus maximized over the splittings {1}{2,3,4}, {1,2}{3,4}, {1,2,3}{4}, {1,2,4}{3}, {1,4}{2,3}, {1,3,4}{2}, {1,3}{2,4}.

Value

the value of the distribution function at c.

Author(s)

Anne-Laure Boulesteix (http://www.slcmsr.net/boulesteix)

References

A.-L. Boulesteix (2006), Maximally selected chi-square statistics and binary splits of nominal variables, Biometrical Journal 48:838-848.

See Also

Ford, Ford2, maxsel.

Examples

# load exactmaxsel library
library(exactmaxsel)

Fcat(c=4,n0=15,n1=10,A=c(6,10,9),statistic="chi2")
Fcat(c=5,n0=15,n1=15,A=c(5,8,7,10),statistic="gini")


[Package exactmaxsel version 1.0-2 Index]