dissimilarity {arules}R Documentation

Dissimilarity Computation

Description

Provides the generic function dissimilarity and the S4 methods to compute and returns distances for binary data in a matrix, transactions or associations.

Usage

dissimilarity(x, y = NULL, method = NULL, args = NULL, ...)
## S4 method for signature 'itemMatrix':
dissimilarity(x, y = NULL, method = NULL, args = NULL,
        which = "transactions")
## S4 method for signature 'associations':
dissimilarity(x, y = NULL, method = NULL, args = NULL,
        which = "transactions")
## S4 method for signature 'matrix':
dissimilarity(x, y = NULL, method = NULL, args = NULL)

Arguments

x the set of elements (e.g., matrix, itemMatrix, transactions, itemsets, rules).
y NULL or a second set to calculate cross dissimilarities.
method the distance measure to be used. Implemented measures are (defaults to "jaccard"):
"affinity":
measure based on the affinity, a similarity measure between items. It is defined as the average affinity between the items in two transactions (see Aggarwal et al. (2002)).
"cosine":
the cosine distance.
"dice":
the Dice's coefficient defined by Dice (1945). Similar to Jaccard but gives double the weight to agreeing items.
"jaccard":
the number of items which occur in both elements divided by the total number of items in the elements (Sneath, 1957). This measure is often also called: binary, asymmetric binary, etc.
"matching":
the Matching coefficient defined by Sokal and Michener (1958). This coefficient gives the same weigth to presents and absence of items.
"pearson":
the 1 - Pearson correlation coefficient .
args a list of additional arguments for the methods.
For calculating "affinity" for associations, the affinities between the items in the transactions are needed and passed to the method as the first element in args.
which a character string indicating if the dissimilarity should be calculated between transavtions (default) or items (use "items").
... further arguments.

Value

returns an object of class dist.

References

Sneath, P. H. A. (1957) Some thoughts on bacterial classification. Journal of General Microbiology 17, pages 184–200.

Sokal, R. R. and Michener, C. D. (1958) A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin 38, pages 1409–1438.

Dice, L. R. (1945) Measures of the amount of ecologic association between species. Ecology 26, pages 297–302.

Charu C. Aggarwal, Cecilia Procopiuc, and Philip S. Yu. (2002) Finding localized associations in market basket data. IEEE Trans. on Knowledge and Data Engineering 14(1):51–62.

See Also

affinity, dist-class, itemMatrix-class, associations-class.

Examples

## cluster items in Groceries with support > 5%
data("Groceries")

s <- Groceries[,itemFrequency(Groceries)>0.05]
d_jaccard <- dissimilarity(s, which = "items")
plot(hclust(d_jaccard, method = "ward"))


## cluster transactions for a sample of Adult
data("Adult")
s <- sample(Adult, 200) 

##  calculate Jaccard distances and do hclust
d_jaccard <- dissimilarity(s)
plot(hclust(d_jaccard))

## calculate affinity-based distances and do hclust
d_affinity <- dissimilarity(s, method = "affinity")
plot(hclust(d_affinity))

## cluster rules
rules <- apriori(Adult)
rules <- subset(rules, subset = lift > 2)

## we need to supply the item affinities from the dataset (sample)
d_affinity <- dissimilarity(rules, method = "affinity", 
  args = list(affinity = affinity(s)))
plot(hclust(d_affinity))

[Package arules version 1.0-0 Index]