measures {arules} | R Documentation |
Provides the generic functions and the needed S4 methods to calculate some additional interest measures for a set of existing associations.
allConfidence(x, ...) ## S4 method for signature 'itemsets': allConfidence(x, transactions = NULL, itemSupport = NULL) hyperLift(x, ...) ## S4 method for signature 'rules': hyperLift(x, transactions, d = 0.99) hyperConfidence(x, ...) ## S4 method for signature 'rules': hyperConfidence(x, transactions = NULL, complements = TRUE, significance = FALSE)
x |
the set of associations. |
... |
further arguments. |
transactions |
the transaction data set used to mine the associations. |
itemSupport |
alternatively to transactions, for some measures a item support in the transaction data set is sufficient. |
d |
the quantile used to calculate hyperlift. |
complements |
calculate convidence/significance levels for substitutes instead of complements. |
significance |
report significance levels instead of confidence levels. |
Currently the following interest measures are implemented:
Lift is defined for the rule X -> Y as:
lift(X -> Y) = P(X+Y)/(P(X)*P(Y)) = c_XY / E[C_XY],
where E[C_{XY}] = c_X c_Y / m with m being the number of transactions in the datanase.
Hyper-lift is defined as:
hyperlift(X -> Y) = c_XY / Q_d[C_XY],
where Q_d[C_XY] is the quantile of the hypergeometric distribution given by d.
A confidence level of, e.g., > 0.95 indicates that there is only a 5% chance that the count for the rule was generated randomly.
A numeriv vector containing the values of the interest measure
for each association
in the set of associations x
.
Edward R. Omiecinski. Alternative interest measures for mining associations in databases. IEEE Transactions on Knowledge and Data Engineering, 15(1):57-69, Jan/Feb 2003.
Michael Hahsler, Kurt Hornik, and Thomas Reutterer. Implications of probabilistic data modeling for rule mining. Report 14, Research Report Series, Department of Statistics and Mathematics, Wirschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria, March 2005.
data("Income") ### calculate all-confidence itemsets <- apriori(Income, parameter = list(target = "freq")) quality(itemsets) <- cbind(quality(itemsets), allConfonfidence = allConfidence(itemsets)) summary(itemsets) ### calculate hyperlift for the 0.9 quantile rules <- apriori(Income) quality(rules) <- cbind(quality(rules), hyperLift = hyperLift(rules, Income, d = 0.9)) inspect(SORT(rules, by = "hyperLift")[1:5]) ### calculate hyper-confidence and discard all rules with ### a confidence level < 1% quality(rules) <- cbind(quality(rules), hyperConfidence = hyperConfidence(rules, Income)) rulesHConf <- rules[quality(rules)$hyperConfidence >= 0.99] inspect(rulesHConf[1:10])