measures {arules}R Documentation

Calculating Additional Interest Measures for Existing Associations

Description

Provides the generic functions and the needed S4 methods to calculate some additional interest measures for a set of existing associations.

Usage

allConfidence(x, ...)
## S4 method for signature 'itemsets':
allConfidence(x, transactions = NULL, itemSupport = NULL)

crossSupportRatio(x, ...)
## S4 method for signature 'itemsets':
crossSupportRatio(x,  transactions = NULL, itemSupport = NULL)

hyperLift(x, ...)
## S4 method for signature 'rules':
hyperLift(x, transactions, d = 0.99)

hyperConfidence(x, ...)
## S4 method for signature 'rules':
hyperConfidence(x, transactions = NULL,  
        complements = TRUE, significance = FALSE)

Arguments

x the set of associations.
... further arguments.
transactions the transaction data set used to mine the associations.
itemSupport alternatively to transactions, for some measures a item support in the transaction data set is sufficient.
d the quantile used to calculate hyperlift.
complements calculate confidence/significance levels for substitutes instead of complements.
significance report significance levels instead of confidence levels.

Details

Currently the following interest measures are implemented:

All-confidence (see, Omiencinski, 2003)
is defined on itemsets as the minimum confidence of all possible rule generated from the itemset.
The cross-support ratio (see, Xiong et al., 2003)
is defined on itemsets as the ratio of the support of the least frequent item to the support of the most frequent item. Cross-support patterns have a ratio smaller than a set threshold. Normally many found patterns are cross-support patterns which contain frequent as well as rare items. Such patterns often tend to be spurious.
Hyper-lift (see, Hahsler et al., 2005)
is an adaptation of the lift measure which is more robust for low counts. It is based on the idea that under independence the count c_{XY} of the transactions which contain all items in a rule X -> Y follows a hypergeometric distribution (represented by the random variable C_{XY}) with the parameters given by the counts c_X and c_Y.

Lift is defined for the rule X -> Y as:

lift(X -> Y) = P(X+Y)/(P(X)*P(Y)) = c_XY / E[C_XY],

where E[C_{XY}] = c_X c_Y / m with m being the number of transactions in the database.

Hyper-lift is defined as:

hyperlift(X -> Y) = c_XY / Q_d[C_XY],

where Q_d[C_XY] is the quantile of the hypergeometric distribution given by d.

Hyper-confidence (based on Hahsler et al., 2005)
calculates the confidence level that we observe too high/low counts for rules X -> Y using the hypergeometric model. Since the counts are drawn from a hypergeometric distribution (represented by the random variable C_{XY}) with known parameters given by the counts c_X and c_Y, we can calculate a confidence interval for the observed counts c_{XY} stemming from the distribution. Hyperconfidence reports the confidence level (significance level) for
complements -
1 - P[C_{XY} >= c_{XY} | c_X, c_Y]
substitutes -
1 - P[C_{XY} < c_{XY} | c_X, c_Y].

A confidence level of, e.g., > 0.95 indicates that there is only a 5% chance that the count for the rule was generated randomly.

Value

A numeric vector containing the values of the interest measure for each association in the set of associations x.

References

Edward R. Omiecinski. Alternative interest measures for mining associations in databases. IEEE Transactions on Knowledge and Data Engineering, 15(1):57-69, Jan/Feb 2003.

Michael Hahsler, Kurt Hornik, and Thomas Reutterer. Implications of probabilistic data modeling for rule mining. Report 14, Research Report Series, Department of Statistics and Mathematics, Wirschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria, March 2005.

See Also

itemsets-class, rules-class

Examples

data("Income")

### calculate all-confidence and the cross-support ratio
itemsets <- apriori(Income, parameter = list(target = "freq")) 
quality(itemsets) <- cbind(quality(itemsets), 
        allConfonfidence = allConfidence(itemsets),
        crossSupportRatio = crossSupportRatio(itemsets))
        
summary(itemsets)

### calculate hyperlift for the 0.9 quantile
rules <- apriori(Income)
quality(rules) <- cbind(quality(rules), 
        hyperLift = hyperLift(rules, Income, d = 0.9))

inspect(SORT(rules, by = "hyperLift")[1:5])

### calculate hyper-confidence and discard all rules with
### a confidence level < 1%
quality(rules) <- cbind(quality(rules),
        hyperConfidence = hyperConfidence(rules, Income))

rulesHConf <- rules[quality(rules)$hyperConfidence >= 0.99]

inspect(rulesHConf[1:10])


[Package arules version 0.3-1 Index]