Abstract
Identifying and expressing data patterns in form of association rules is a commonly used technique in data mining. Typically, association rules discovery is based on two criteria: support and confidence. In this paper we will briefly discuss the insufficiency on these two criteria, and argue the importance of including interestingness/dependency as a criterion for (association) pattern discovery. From the practical computational perspective, we will show how the proposed criterion grounded on interestingness could be used to improve the efficiency of pattern discovery mechanism. Furthermore, we will show a probabilistic inference mechanism that provides an alternative to pattern discovery. Example illustration and preliminary study for evaluating the proposed approach will be presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Genesereth M., Nilsson N.: Logical Foundations of Artificial Intelligence. Morgan Kaufmann (1987)
Freedman, D.: From association to causation: Some remarks on the history of statistics. Statistical Science 14 Vol. 3 (1999) 243–258
Cover T.M., Thomas J.A.: Elements of Information Theory. New York: John Wiley & Sons (1991)
Rish I., Hellerstein J., Jayram T.: An Analysis of Data Characteristics that affect Naive Bayes Performance. Tec. Rep. RC21993, IBM Watson Research Center (2001)
Barber B., Hamilton H.J.: Extracting Share Frequent Itemsets with Infrequent Subsets. Data Mining and Knowledge Discovery. (2003) 7(2):153–168
Yang J., Wang W., Yu P.S., Han J.: Mining Long Sequential Patterns in a Noisy Environment. ACM SIGMOD June 4–6, Madison, Wisconsin (2002) 406–417
Kullback S.: Information Theory and Statistics. John Wiley & Sons Inc (1959)
Basharin G.: Theory of Probability and its Applications. Vol. 4 (1959) 333–336
Silverstein C., Brin S., Motwani R.: Beyond Market Baskets: Generalizaing Association Rules to Dependence Rules. Data Mining and Knowledge Discovery. (1998) 2(1):39–68
Agrawal R., Imielinski T., Swami A.: Mining Association Rules between Sets of Items in large Databases. Proc. ACM SIGMOD Conf. Washington DC, May (1993)
Agrawal R., Srikant R.: Fast Algorithms for Mining Association Rules. VLDDBB (1994) 487–499
Toivonen H.: Sampling Large Databases for Association Rules. Proc. 22nd VLDB (1996) 134–145
Sy B.K.: Probability Model Selection Using Information-Theoretic Optimization Criterion. J. of Statistical Computing & Simulation, Gordan & Breach. V69-3 (2001)
Hoeffding W.: Probability Inequalities for sums of bounded Random Variables. Journal of the American Statistical Associations. Vol. 58 (1963) 13–30
Zaki M.: SPADE: an efficient algorithm for Mining Frequent Sequences. Machine Learning Journal, Vol. 42?1/2 (2001) 31–60
http://davis.wpi.edu/~xmdv/datasets.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sy, B.K. (2003). Discovering Association Patterns Based on Mutual Information. In: Perner, P., Rosenfeld, A. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2003. Lecture Notes in Computer Science, vol 2734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45065-3_32
Download citation
DOI: https://doi.org/10.1007/3-540-45065-3_32
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40504-7
Online ISBN: 978-3-540-45065-8
eBook Packages: Springer Book Archive