jkdist {gcExplorer}R Documentation

Further Distance and Centroid Computations

Description

Helper functions to create 'kccaFamily' objects.

Usage

distJackCor(x, centers)
distJackEuc(x, centers)
distJackMan(x, centers)
distJackMax(x, centers)

centSpline(d)

Arguments

x A data matrix
d A data matrix
centers A matrix of centroids

Details

A possible problem using classical distance measures for clustering time–course gene expression data is that single outlier variables can completely change the expression pattern of certain genes. Outliers at special time points are very common in microarray experiments as technical problems like dust or a scratch on the slide can easily distort the data. In such a case these outlier variables can lead to unwanted correlations between genes and to incorrect assignment to clusters. There is a need for distance measures which are robust against outlier variables. The idea of Jackknife (Efron, 1982) distance measures is not to exclude the whole observation for such a gene but rather one or several variables. We want to introduce so–called "Jackknife" distance measures which can handle one outlier time point. The so-called Jackknife correlation was first used by Heyer et al. (1999) to cluster gene expression data. It is defined as

d_xy = 1 - min(rho_xy^(1), rho_xy^(2), ..., rho_xy^(T))

where rho_xy^(t) is the correlation of pair x,y computed with the t-th time point deleted.

This concept can be extended for the three geometric distance measures Euclidean, Manhattan and Maximum distance. Jackknife Euclidean distance is defined as

d_xy = min(d_xy^(1), d_xy^(2), ..., d_xy^(T))

where d_xy^(t) is the Euclidean distance of pair x,y computed with the t-th time point deleted. Jackknife Manhattan distance and Jackknife Maximum distance can be defined in the same way.

Author(s)

Theresa Scharl

References

Theresa Scharl and Friedrich Leisch: Jackknife distances for clustering time–course gene expression data, in JSM Proceedings 2006


[Package gcExplorer version 0.9-1 Index]