segclustselect {segclust}R Documentation

segclustselect

Description

Model selection for segmentation/clustering

Usage

        out <- segclustselect(x,param,Pmin,Pmax,Kmax,Linc,method,S=0.5,lmin=1,lmax=length(x),vh=TRUE)

Arguments

x data vector
param parameters estimated by hybrid()
Pmin minimum number of clusters
Pmax maximum number of clusters
Kmax maximum number of segments
Linc incomplete-data log-likelihood calculated by hybrid()
method Method used of the selection. Equals "sequential" or "BIC"
S threshold for the adaptive method, default value S = 0.5
lmin minimal segment length, default value lmin = 1
lmax maximal segment length, default value lmax = length(x)
vh TRUE for homogeneous variances (default), FALSE otherwise

Details

This function is used to select simulteaneously Pselect and Kselect, the number of clusters and the number of segments in a segmentation/clustering model. It is based on the penalization of the incomplete-data log-likelhood Linc. Two methods are implemented. The first one is based on a sequential choice of Pselect and Kselect as described in Picard et al. (2007). The second one is based on a modified BIC criterion as described in Zhang et al. (2007). The function uses the Stirling approximation of the Gamma function, such that :

log Gamma(x) ~ (x-1/2)* log(x) - x + 1/2 * log(2 π)

Value

Pselect Selected number of clusters
Kselect Selected number of segments

References

Picard, F., Robin, S., Lebarbier, E., & Daudin, J. -J. (2007). A segmentation/clustering model for the analysis of array CGH data. Biometrics, 63(3) 758-766 \ Zhang NR, Siegmund DO. A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics. 2007 63(1):22-32.

Examples

        x1      <- rnorm(20,0,1)
        x2      <- rnorm(30,2,1)
        x3      <- rnorm(10,0,1)
        x4      <- rnorm(40,2,1)
        x       <- c(x1,x2,x3,x4)
        Pmin    <- 1
        Pmax    <- 4
        Kmax    <- 20
        Linc    <- matrix(-Inf, ncol=Pmax,nrow= Kmax)
        param.list   <- list()
        for (P in (Pmin:Pmax)){   
            out.hybrid      <- hybrid(x,P,Kmax)
            param.list[[P]] <- out.hybrid$param
            Linc[,P]        <- out.hybrid$Linc    
        }
        out.select <- segclustselect(x,param=param.list,Pmin,Pmax,Kmax,Linc,method = "BIC")

[Package segclust version 0.74 Index]