jkdist {gcExplorer} | R Documentation |
Helper functions to create 'kccaFamily' objects.
distJackCor(x, centers) distJackEuc(x, centers) distJackMan(x, centers) distJackMax(x, centers) centSpline(d)
x |
A data matrix |
d |
A data matrix |
centers |
A matrix of centroids |
A possible problem using classical distance measures for clustering time–course gene expression data is that single outlier variables can completely change the expression pattern of certain genes. Outliers at special time points are very common in microarray experiments as technical problems like dust or a scratch on the slide can easily distort the data. In such a case these outlier variables can lead to unwanted correlations between genes and to incorrect assignment to clusters. There is a need for distance measures which are robust against outlier variables. The idea of Jackknife (Efron, 1982) distance measures is not to exclude the whole observation for such a gene but rather one or several variables. We want to introduce so–called "Jackknife" distance measures which can handle one outlier time point. The so-called Jackknife correlation was first used by Heyer et al. (1999) to cluster gene expression data. It is defined as
d_xy = 1 - min(rho_xy^(1), rho_xy^(2), ..., rho_xy^(T))
where rho_xy^(t) is the correlation of pair x,y computed with the t-th time point deleted.
This concept can be extended for the three geometric distance measures Euclidean, Manhattan and Maximum distance. Jackknife Euclidean distance is defined as
d_xy = min(d_xy^(1), d_xy^(2), ..., d_xy^(T))
where d_xy^(t) is the Euclidean distance of pair x,y computed with the t-th time point deleted. Jackknife Manhattan distance and Jackknife Maximum distance can be defined in the same way.
Theresa Scharl
Theresa Scharl and Friedrich Leisch: Jackknife distances for clustering time–course gene expression data, in JSM Proceedings 2006