getSepProj {clusterGeneration} | R Documentation |
Optimal projection direction and corresponding separation index for pairs of clusters.
getSepProjTheory(muMat, SigmaArray, iniProjDirMethod=c("SL", "naive"), projDirMethod=c("newton", "fixedpoint"), alpha=0.05, ITMAX=20, eps=1.0e-10, quiet=TRUE) getSepProjData(y, cl, iniProjDirMethod=c("SL", "naive"), projDirMethod=c("newton", "fixedpoint"), alpha=0.05, ITMAX=20, eps=1.0e-10, quiet=TRUE)
muMat |
Matrix of mean vectors. Rows correspond to mean vectors for clusters. |
SigmaArray |
Array of covariance matrices. SigmaArray[,,i] record the covariance
matrix of the i -th cluster.
|
y |
Data matrix. Rows correspond to observations. Columns correspond to variables. |
cl |
Cluster membership vector. |
iniProjDirMethod |
Indicating the method to get initial projection direction when calculating
the separation index between a pair of clusters (c.f. Qiu and Joe,
2006a, 2006b). iniProjDirMethod =“SL” indicates the initial projection
direction is the sample version of the SL's projection direction
(Su and Liu, 1993)
(boldsymbol{Σ}_1+boldsymbol{Σ}_2)^{-1}(boldsymbol{μ}_2-boldsymbol{μ}_1)iniProjDirMethod =“naive” indicates the initial projection
direction is boldsymbol{μ}_2-boldsymbol{μ}_1
|
projDirMethod |
Indicating the method to get the optimal projection direction when calculating
the separation index between a pair of clusters (c.f. Qiu and Joe,
2006a, 2006b). projDirMethod =“newton” indicates we use the Newton-Raphson
method to search the optimal projection direction (c.f. Qiu and Joe, 2006a).
This requires the assumptions that both covariance matrices of the pair of
clusters are positive-definite. If this assumption is violated, the
“fixedpoint” method could be used. The “fixedpoint” method
iteratively searches the optimal projection direction based on the first
derivative of the separation index to the project direction
(c.f. Qiu and Joe, 2006b).
|
alpha |
Tuning parameter reflecting the percentage in the two
tails of a projected cluster that might be outlying.
We set alpha =0.05 like we set
the significance level in hypothesis testing as 0.05.
|
ITMAX |
Maximum iteration allowed when to iteratively calculate the optimal projection direction. The actual number of iterations is usually much less than the default value 20. |
eps |
Convergence threshold. A small positive number to check if a quantitiy
q is equal to zero. If |q|<eps , then we regard q
as equal to zero. eps is used to check if an algorithm converges.
The default value is 1.0e-10.
|
quiet |
A flag to switch on/off the outputs of intermediate results and/or possible warning messages. The default value is TRUE .
|
When calculating the optimal projection direction and corresponding optimal
separation index for a pair of cluster, if one or both cluster covariance
matrices is/are singular, the ‘newton’ method can not be used.
In this case, the functions getSepProjTheory
and getSepProjData
will automatically use the ‘fixedpoint’ method to search the optimal
projection direction, even if the user specifies the value of the argument
projDirMethod
as ‘newton’. Also, multiple initial projection
directions will be evaluated.
Specifically, 2+2p projection directions will be evaluated. The first projection direction is the “naive” direction boldsymbol{μ}_2-boldsymbol{μ}_1. The second projection direction is the “SL” projection direction (boldsymbol{Σ}_1+boldsymbol{Σ}_2)^{-1} (boldsymbol{μ}_2-boldsymbol{μ}_1). The next p projection directions are the p eigenvectors of the covariance matrix of the first cluster. The remaining p projection directions are the p eigenvectors of the covariance matrix of the second cluster.
Each of these 2+2*p projection directions are in turn used as the initial projection direction for the ‘fixedpoint’ algorithm to obtain the optimal projection direction and the corresponding optimal separation index. We also obtain 2+2*p separation indices by projecting two clusters along each of these 2+2*p projection directions.
Finally, the projection direction with the largest separation index among the 2*(2+2*p) optimal separation indices is chosen as the optimal projection direction. The corresponding separation index is chosen as the optimal separation index.
sepValMat |
Separation index matrix |
projDirArray |
Array of projection directions for each pair of clusters |
Weiliang Qiu stwxq@channing.harvard.edu
Harry Joe harry@stat.ubc.ca
Qiu, W.-L. and Joe, H. (2006a) Generation of Random Clusters with Specified Degree of Separaion. Journal of Classification, 23(2), 315-334.
Qiu, W.-L. and Joe, H. (2006b) Separation Index and Partial Membership for Clustering. Computational Statistics and Data Analysis, 50, 585–603.
Su, J. Q. and Liu, J. S. (1993) Linear Combinations of Multiple Diagnostic Markers. Journal of the American Statistical Association, 88, 1350–1355.
n1<-50 mu1<-c(0,0) Sigma1<-matrix(c(2,1,1,5),2,2) n2<-100 mu2<-c(10,0) Sigma2<-matrix(c(5,-1,-1,2),2,2) projDir<-c(1, 0) muMat<-rbind(mu1, mu2) SigmaArray<-array(0, c(2,2,2)) SigmaArray[,,1]<-Sigma1 SigmaArray[,,2]<-Sigma2 a<-getSepProjTheory(muMat, SigmaArray, iniProjDirMethod="SL") # separation index for cluster distributions 1 and 2 a$sepValMat[1,2] # projection direction for cluster distributions 1 and 2 a$projDirArray[1,2,] library(MASS) y1<-mvrnorm(n1, mu1, Sigma1) y2<-mvrnorm(n2, mu2, Sigma2) y<-rbind(y1, y2) cl<-rep(1:2, c(n1, n2)) b<-getSepProjData(y, cl, iniProjDirMethod="SL", projDirMethod="newton") # separation index for clusters 1 and 2 b$sepValMat[1,2] # projection direction for clusters 1 and 2 b$projDirArray[1,2,]