dist {amap} | R Documentation |
This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix.
dist(x, method = "euclidean", diag = FALSE, upper = FALSE) print.dist(x, diag = NULL, upper = NULL, ...) as.matrix.dist(x) as.dist(m, diag = FALSE, upper = FALSE)
x |
numeric matrix or (data frame). Distances between the rows of
x will be computed. |
method |
the distance measure to be used. This must be one of
"euclidean" , "maximum" , "manhattan" ,
"canberra" or "binary" "pearson" or
"correlation" .
Any unambiguous substring can be given. |
diag |
logical value indicating whether the diagonal of the
distance matrix should be printed by print.dist . |
upper |
logical value indicating whether the upper triangle of the
distance matrix should be printed by print.dist . |
m |
A matrix of distances to be converted to a "dist"
object (only the lower triangle is used, the rest is ignored). |
... |
further arguments, passed to the (next) print method. |
Available distance measures are (written for two vectors x and y):
euclidean
:maximum
:manhattan
:canberra
:binary
:pearson
:correlation
:
Missing values are allowed, and are excluded from all computations
involving the rows within which they occur. If some columns are
excluded in calculating a Euclidean, Manhattan or Canberra distance,
the sum is scaled up proportionally to the number of columns used.
If all pairs are excluded when calculating a particular distance,
the value is NA
.
The functions as.matrix.dist()
and as.dist()
can be used
for conversion between objects of class "dist"
and conventional
distance matrices and vice versa.
An object of class "dist"
.
The lower triangle of the distance matrix stored by columns in a
vector, say do
. If n
is the number of
observations, i.e., n <- attr(do, "Size")
, then
for i < j <= n, the dissimilarity between (row) i and j is
do[n*(i-1) - i*(i-1)/2 + j-i]
.
The length of the vector is n*(n-1)/2, i.e., of order n^2.
The object has the following attributes (besides "class"
equal
to "dist"
):
Size |
integer, the number of observations in the dataset. |
Labels |
optionally, contains the labels, if any, of the observations of the dataset. |
Diag, Upper |
logicals corresponding to the arguments diag
and upper above, specifying how the object should be printed. |
call |
optionally, the call used to create the
object. |
methods |
optionally, the distance method used; resulting form
dist() , the (match.arg() ed) method
argument. |
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979) Multivariate Analysis. London: Academic Press.
daisy
in the ‘cluster’ package with more
possibilities in the case of mixed (contiuous / categorical)
variables.
dist
hclust
.
x <- matrix(rnorm(100), nrow=5) dist(x) dist(x, diag = TRUE) dist(x, upper = TRUE) m <- as.matrix(dist(x)) d <- as.dist(m) stopifnot(d == dist(x)) names(d) <- LETTERS[1:5] print(d, digits = 3) ## example of binary and canberra distances. x <- c(0, 0, 1, 1, 1, 1) y <- c(1, 0, 1, 1, 0, 1) dist(rbind(x,y), method="binary") ## answer 0.4 = 2/5 dist(rbind(x,y), method="canberra") ## answer 2 * (6/5)