segclustselect {segclust} | R Documentation |
Model selection for segmentation/clustering
out <- segclustselect(x,param,Pmin,Pmax,Kmax,Linc,method,S=0.5,lmin=1,lmax=length(x),vh=TRUE)
x |
data vector |
param |
parameters estimated by hybrid() |
Pmin |
minimum number of clusters |
Pmax |
maximum number of clusters |
Kmax |
maximum number of segments |
Linc |
incomplete-data log-likelihood calculated by hybrid() |
method |
Method used of the selection. Equals "sequential" or "BIC" |
S |
threshold for the adaptive method, default value S = 0.5 |
lmin |
minimal segment length, default value lmin = 1 |
lmax |
maximal segment length, default value lmax = length(x) |
vh |
TRUE for homogeneous variances (default), FALSE otherwise |
This function is used to select simulteaneously Pselect and Kselect, the number of clusters and the number of segments in a segmentation/clustering model. It is based on the penalization of the incomplete-data log-likelhood Linc. Two methods are implemented. The first one is based on a sequential choice of Pselect and Kselect as described in Picard et al. (2007). The second one is based on a modified BIC criterion as described in Zhang et al. (2007). The function uses the Stirling approximation of the Gamma function, such that :
log Gamma(x) ~ (x-1/2)* log(x) - x + 1/2 * log(2 π)
Pselect |
Selected number of clusters |
Kselect |
Selected number of segments |
Picard, F., Robin, S., Lebarbier, E., & Daudin, J. -J. (2007). A segmentation/clustering model for the analysis of array CGH data. Biometrics, 63(3) 758-766 \ Zhang NR, Siegmund DO. A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics. 2007 63(1):22-32.
x1 <- rnorm(20,0,1) x2 <- rnorm(30,2,1) x3 <- rnorm(10,0,1) x4 <- rnorm(40,2,1) x <- c(x1,x2,x3,x4) Pmin <- 1 Pmax <- 4 Kmax <- 20 Linc <- matrix(-Inf, ncol=Pmax,nrow= Kmax) param.list <- list() for (P in (Pmin:Pmax)){ out.hybrid <- hybrid(x,P,Kmax) param.list[[P]] <- out.hybrid$param Linc[,P] <- out.hybrid$Linc } out.select <- segclustselect(x,param=param.list,Pmin,Pmax,Kmax,Linc,method = "BIC")