interestMeasure {arules} | R Documentation |
Provides the generic function interestMeasure
and the needed S4 method
to calculate various additional interest measures for existing sets of
itemsets or rules.
interestMeasure(x, method, transactions = NULL, ...)
x |
a set of itemsets or rules. |
method |
name of the interest measure (see details for available measures). |
transactions |
the transaction data set used to mine the associations. |
... |
further arguments for the measure calculation. |
For itemsets the following measures are implemented:
For rules the following measures are implemented:
NA
since the approximation used in the chi-square test breaks down.Lift is defined for the rule X -> Y as:
lift(X -> Y) = P(X+Y)/(P(X)*P(Y)) = c_XY / E[C_XY],
where E[C_{XY}] = c_X c_Y / m with m being the number of transactions in the database.
Hyper-lift is defined as:
hyperlift(X -> Y) = c_XY / Q_d[C_XY],
where Q_d[C_XY] is the
quantile of the hypergeometric distribution given by d.
The quantile can be given
as parameter d
(default: d=0.99
).
Range: 0... Inf.
significance=TRUE
is used) for
A confidence level of, e.g., > 0.95 indicates that there is only a 5% chance that the count for the rule was generated randomly.
Per default complementary effects are mined, substitutes can be found
by using the parameter complements = FALSE
.
Range: 0...1.
Note that for calculating the interest measures
support (for rules also confidence and lift)
stored in the quality slot of x
are needed.
These measures are returned by the mining algorithms implemented in
this package. Note also, that the calculation of some measures is quite slow
since we do not have access to the original itemset structure which
was used for mining.
A numeric vector containing the values of the interest measure
for each association
in the set of associations x
.
R. Bayardo, R. Agrawal, and D. Gunopulos (2000). Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery, 4(2/3):217-240, 2000.
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur (1997). Dynamic itemset counting and implication rules for market basket data. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, pages 255-264, Tucson, Arizona, USA.
Michael Hahsler, Kurt Hornik, and Thomas Reutterer (2005). Implications of probabilistic data modeling for rule mining. Report 14, Research Report Series, Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria.
Bing Liu, Wynne Hsu, and Yiming Ma (1999). Pruning and summarizing the discovered associations. In KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 125-134. ACM Press, 1999.
Edward R. Omiecinski (2003). Alternative interest measures for mining associations in databases. IEEE Transactions on Knowledge and Data Engineering, 15(1):57-69, Jan/Feb 2003.
Pang-Ning Tan, Vipin Kumar, and Jaideep Srivastava (2004). Selecting the right objective measure for association analysis. Information Systems, 29(4):293-313.
Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Databases, pages 229-248.
Hui Xiong, Pang-Ning Tan, and Vipin Kumar (2003). Mining strong affinity association patterns in data sets with skewed support distribution. In Bart Goethals and Mohammed J. Zaki, editors, Proceedings of the IEEE International Conference on Data Mining, November 19 - 22, 2003, Melbourne, Florida, pages 387-394.
data("Income") rules <- apriori(Income) quality(rules) <- cbind(quality(rules), hyperConfidence = interestMeasure(rules, method = "hyperConfidence", Income)) inspect(head(SORT(rules, by = "hyperConfidence")))