noverlap.for {noverlap} | R Documentation |
Applies the regression depth method (RDM) to binary regression. This method computes approximately the number of data points that can be removed from a data set such that the remaining data set has NO overlap, i.e. that the remaining data set has complete separation or quasi-complete separation. If Noverlap=0 then these maximum likelihood estimates for the parameter vector in many binary regression models such as logistic regression or probit regression do not exist.
noverlap.for(Z,NDIR=10000,PLOT=FALSE)
Z |
The data set Z has to be a matrix with ncol(Z)-1 <= nrow(Z) <= 10000. The first ncol(Z)-1 columns of Z are the design matrix X. The last column of Z is the binary response vector y (0/1). |
NDIR |
Maximal number of directions (integer) |
PLOT |
logical, if TRUE then draw a plot |
A list with components
Z |
data matrix |
NDIR |
10000 directions |
PLOT |
FALSE, i.e. make no plot |
Andreas Christmann, Peter J. Rousseeuw Christmann@statistik.uni-dortmund.de
Christmann, A., Rousseeuw, P.J. (2001). Measuring overlap in logistic regression. Computational Statistics and Data Analysis, 37, 65-75.
Christmann, A. (2002). Classification based on the support vector machine and on regression depth. In: Y. Dodge (Ed.): Statistical Data Analysis Based on the L1-Norm and Related Methods. Series: Statistics for industry and technology. Birkhaeuser, Basel, pp. 341-352.
Christmann, A., Fischer, P., Joachims, T. (2002). Comparison between various regression depth methods and the support vector machine to approximate the minimum number of misclassifications. Computational Statistics, 17, 273-287.
data(Z2) noverlap.for(Z2) noverlap.for(Z2,NDIR=100000) # x11() postscript(file="tmp1.ps") par(mfrow=c(2,1)) noverlap.for(Z2,NDIR=10000,PLOT=TRUE) tmp <- noverlap.for(Z2) tmp$NOVERLAP tmp$COEFFICIENTS tmp$NSIN tmp$DETAILS Z3 <- as.data.frame(Z2) names(Z3) <- c("x1","x2","y") plot(x2 ~ x1, data=Z3,pch=as.character(y),main="Scatterplot") abline(c(0,1.5),col="blue") points(Z3[2,1],Z3[2,2],pch=as.character(Z3[2,3]),col="red") dev.off() # NO OVERLAP: maximum likelihood estimates do NOT exist data(Z1) Z1 # X11() postscript(file="tmp2.ps") noverlap.for(Z1) tmp <- noverlap.for(Z1) tmp$NOVERLAP tmp$COEFFICIENTS tmp$NSIN tmp$DETAILS Z3 <- as.data.frame(Z1) names(Z3) <- c("x1","y") plot(y ~ x1, data=Z3,pch=as.character(y),main="Scatterplot") summary(glm(y ~ x1, data=Z3, family=binomial(link=logit), trace=TRUE, maxit=30)) dev.off() # NO OVERLAP: maximum likelihood estimates in the logistic regression model # do NOT exist for the banknotes data set data(Banknotes) Banknotes # X11() postscript(file="tmp3.ps") tmp <- noverlap.for(Banknotes,PLOT=TRUE) dev.off() tmp$NOVERLAP tmp$COEFFICIENTS Z3 <- as.data.frame(Banknotes) names(Z3) <- c("x1","x2", "x3", "x4", "x5", "x6","y") summary(glm(y ~ x1+x2+x3+x4+x5+x6, data=Z3, family=binomial(link=logit), trace=TRUE, maxit=30))