Dist {amap} | R Documentation |
This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix.
Dist(x, method = "euclidean", nbproc = 1, diag = FALSE, upper = FALSE)
x |
numeric matrix or (data frame) or an object of class
"exprSet".
Distances between the rows of
x will be computed. |
method |
the distance measure to be used. This must be one of
"euclidean" , "maximum" , "manhattan" ,
"canberra" , "binary" , "pearson" ,
"correlation" , "spearman" or "kendall" .
Any unambiguous substring can be given. |
nbproc |
integer, Number of subprocess for parallelization |
diag |
logical value indicating whether the diagonal of the
distance matrix should be printed by print.dist . |
upper |
logical value indicating whether the upper triangle of the
distance matrix should be printed by print.dist . |
Available distance measures are (written for two vectors x and y):
euclidean
:maximum
:manhattan
:canberra
:binary
:pearson
:correlation
:spearman
:
Dist(x,method="spearman")[i,j] =
cor.test(x[i,],x[j,],method="spearman")$statistic
kendall
:
Missing values are allowed, and are excluded from all computations
involving the rows within which they occur. If some columns are
excluded in calculating a Euclidean, Manhattan or Canberra distance,
the sum is scaled up proportionally to the number of columns used.
If all pairs are excluded when calculating a particular distance,
the value is NA
.
The functions as.matrix.dist()
and as.dist()
can be used
for conversion between objects of class "dist"
and conventional
distance matrices and vice versa.
An object of class "dist"
.
The lower triangle of the distance matrix stored by columns in a
vector, say do
. If n
is the number of
observations, i.e., n <- attr(do, "Size")
, then
for i < j <= n, the dissimilarity between (row) i and j is
do[n*(i-1) - i*(i-1)/2 + j-i]
.
The length of the vector is n*(n-1)/2, i.e., of order n^2.
The object has the following attributes (besides "class"
equal
to "dist"
):
Size |
integer, the number of observations in the dataset. |
Labels |
optionally, contains the labels, if any, of the observations of the dataset. |
Diag, Upper |
logicals corresponding to the arguments diag
and upper above, specifying how the object should be printed. |
call |
optionally, the call used to create the
object. |
methods |
optionally, the distance method used; resulting form
dist() , the (match.arg() ed) method
argument. |
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979) Multivariate Analysis. London: Academic Press.
Wikipedia http://en.wikipedia.org/wiki/Kendall_tau_distance
daisy
in the ‘cluster’ package with more
possibilities in the case of mixed (contiuous / categorical)
variables.
dist
hcluster
.
x <- matrix(rnorm(100), nrow=5) Dist(x) Dist(x, diag = TRUE) Dist(x, upper = TRUE) ## compute dist with 8 threads Dist(x,nbproc=8)