Advertisement

Formal Framework for the Study of Algorithmic Properties of Objective Interestingness Measures

  • Yannick Le Bras
  • Philippe Lenca
  • Stéphane Lallich
Part of the Intelligent Systems Reference Library book series (ISRL, volume 24)

Abstract

Association Rules Discovery is an increasing subdomain of Datamining. Many works have focused on the extraction and the evaluation of the association rules, leading to many technical improvments on the algorithms, and many different measures. But few number of them have tried to merge the both. We introduce here a formal framework for the study of association rules and interestingness measures that allows an analytic study of these objects. This framework is based on the contingency table of a rule and let us make a link between analytic properties of the measures and algorithmic properties. We give as example the case of three algorithmic properties for the extraction of association rules that were generalized and applied with the help of this framework. These properties allow a pruning of the search space based on a large number of measures and without any support constraint.

Keywords

Contingency Table Association Rule Descriptor System Mining Association Rule Pruning Strategy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: 1998 ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Seattle, Washington, United States, pp. 18–24. ACM Press, New York (1998)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Imieliski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) ACM SIGMOD International Conference on Management of Data, Washington, D.C., United States, pp. 207–216. ACM Press, New York (1993)Google Scholar
  3. 3.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, pp. 478–499. Morgan Kaufmann, San Francisco (1994)Google Scholar
  4. 4.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
  5. 5.
    Azé, J., Kodratoff, Y.: Evaluation de la résistance au bruit de quelques mesures d’extraction de règles d’association. In: Hérin, D., Zighed, D.A. (eds.) 2nd Extraction et Gestion des Connaissances Conference, Montpellier, France. Extraction des Connaissances et Apprentissage, vol. 1-4, pp. 143–154. Hermes Science Publications (January 2002)Google Scholar
  6. 6.
    Borgelt, C., Kruse, R.: Induction of association rules: Apriori implementation. In: 15th Conference on Computational Statistics, Berlin, Germany, pp. 395–400. Physika Verlag, Heidelberg (2002)Google Scholar
  7. 7.
    Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Peckham, J. (ed.) ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, pp. 265–276. ACM Press, New York (1997)CrossRefGoogle Scholar
  8. 8.
    Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Peckham, J. (ed.) ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, pp. 255–264. ACM Press, New York (1997)CrossRefGoogle Scholar
  9. 9.
    Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)Google Scholar
  10. 10.
    Cleverdon, C.W., Mills, J., Keen, M.: Factors determining the performance of indexing systems. In: ASLIB Cranfield Project, Cranfield (1966)Google Scholar
  11. 11.
    Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)CrossRefGoogle Scholar
  12. 12.
    Geng, L., Hamilton, H.J.: Interestingness measures for data mining: A survey. ACM Computing Surveys 38(3, article 9) (2006)Google Scholar
  13. 13.
    Gini, C.: Measurement of inequality and incomes. The Economic Journal 31, 124–126 (1921)CrossRefGoogle Scholar
  14. 14.
    Gray, B., Orlowska, M.E.: CCAIIA: Clustering categorical attributes into interesting accociation rules. In: Wu, X., Ramamohanarao, K., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394, pp. 132–143. Springer, Heidelberg (1998)Google Scholar
  15. 15.
    Guillaume, S., Grissa, D., Nguifo, E.M.: Propriété des mesures d’intérêt pour l’extraction des règles. In: 6th Workshop on Qualité des Données et des Connaissances, in Conjunction With the 10th Extraction et Gestion des Connaissances Conference, Hammamet, Tunisie, pp. 15–28 (2010)Google Scholar
  16. 16.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) ACM SIGMOD International Conference on Management of Data, pp. 1–12. ACM Press, New York (2000)CrossRefGoogle Scholar
  17. 17.
    Hébert, C., Crémilleux, B.: Optimized rule mining through a unified framework for interestingness measures. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 238–247. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et du Jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 547–579 (1901)Google Scholar
  19. 19.
    Jeffreys, H.: Some tests of significance, treated by the theory of probability. Proceedings of the Cambridge Philosophical Society 31, 203–222 (1935)CrossRefGoogle Scholar
  20. 20.
    Klösgen, W.: Problems for knowledge discovery in databases and their treatment in the statistics interpreter EXPLORA. International Journal of Intelligent Systems 7, 649–673 (1992)zbMATHCrossRefGoogle Scholar
  21. 21.
    Lavrač, N., Flach, P.A., Zupan, B.: Rule evaluation measures: A unifying view. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 174–185. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  22. 22.
    Le Bras, Y., Lenca, P., Lallich, S.: On optimal rule mining: A framework and a necessary and sufficient condition of antimonotonicity. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 705–712. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  23. 23.
    Le Bras, Y., Lenca, P., Lallich, S.: Mining interesting rules without support requirement: a general universal existential upward closure property. Annals of Information Systems 8(part 2), 75–98 (2010); 8232 CrossRefGoogle Scholar
  24. 24.
    Le Bras, Y., Lenca, P., Lallich, S., Moga, S.: Généralisation de la propriété de monotonie de la all-confidence pour l’extraction de motifs intéressants non fréquents. In: 5th Workshop on Qualité des Données et des Connaissances, in Conjunction With the 9th Extraction et Gestion des Connaissances Conference, Strasbourg, France, pp. 17–24 (January 2009)Google Scholar
  25. 25.
    Le Bras, Y., Lenca, P., Moga, S., Lallich, S.: All-monotony: A generalization of the all-confidence antimonotony. In: 4th International Conference on Machine Learning and Applications, pp. 759–764 (2009)Google Scholar
  26. 26.
    Le Bras, Y., Meyer, P., Lenca, P., Lallich, S.: A robustness measure of association rules. In: 13rd European Conference on Principles of Data Mining and Knowledge Discovery, Barcelona, Spain (2010)Google Scholar
  27. 27.
    Lenca, P., Meyer, P., Vaillant, B., Lallich, S.: On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. European Journal of Operational Research 184(2), 610–626 (2008)zbMATHCrossRefGoogle Scholar
  28. 28.
    Lerman, I.C., Gras, R., Rostam, H.: Elaboration d’un indice d’implication pour les données binaires, I et II. Mathématiques et Sciences Humaines (74,75), 5–35, 5–47 (1981)zbMATHGoogle Scholar
  29. 29.
    Li, J.: On optimal rule discovery. IEEE Transaction on Knowledge and Data Engineering 18(4), 460–471 (2006)CrossRefGoogle Scholar
  30. 30.
    Li, J.: Robust rule-based prediction. IEEE Transaction on Knowledge and Data Engineering 18(8), 1043–1054 (2006)CrossRefGoogle Scholar
  31. 31.
    Loevinger, J.: A systemic approach to the construction and evaluation of tests of ability. Psychological monographs 61(4) (1947)Google Scholar
  32. 32.
    Omiecinski, E.: Alternative interest measures for mining associations in databases. IEEE Transaction on Knowledge and Data Engineering 15(1), 57–69 (2003)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Pearson, K.: Mathematical contributions to the theory of evolution. III. regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 187, 253–318 (1896)zbMATHCrossRefGoogle Scholar
  34. 34.
    Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Knowledge Discovery in Databases, pp. 229–248. AAAI/MIT Press, Cambridge (1991)Google Scholar
  35. 35.
    Salton, G., McGill, M.J.: Introduction to Modern Retrieval. McGraw-Hill Book Company, New York (1983)zbMATHGoogle Scholar
  36. 36.
    Sebag, M., Schoenauer, M.: Generation of rules with certainty and confidence factors from incomplete and incoherent learning bases. In: Boose, J., Gaines, B., Linster, M. (eds.) European Knowledge Acquisition Workshop, pp. 28.1–28.20. Gesellschaft für Mathematik und Datenverarbeitung mbH, Sankt Augustin, Germany (1988)Google Scholar
  37. 37.
    Smyth, P., Goodman, R.M.: Rule induction using information theory. In: Knowledge Discovery in Databases, pp. 159–176. AAAI/MIT Press, Cambridge (1991)Google Scholar
  38. 38.
    Suzuki, E.: Pitfalls for categorizations of objective interestingness measures for rule discovery. In: Gras, R., Suzuki, E., Guillet, F., Spagnolo, F. (eds.) Statistical Implicative Analysis, Theory and Applications. SCI, vol. 127, pp. 383–395. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  39. 39.
    Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Information Systems 4(29), 293–313 (2004)CrossRefGoogle Scholar
  40. 40.
    Wang, K., He, Y., Cheung, D.W.: Mining confident rules without support requirement. In: 10th International Conference on Information and Knowledge Management, Atlanta, Georgia, USA, pp. 89–96. ACM Press, New York (2001)Google Scholar
  41. 41.
    Wang, K., Tay, S.H.W., Liu, B.: Interestingness-based interval merger for numeric association rules. In: 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, pp. 121–128. ACM Press, New York (1998)Google Scholar
  42. 42.
    Xiong, H., Tan, P.N., Kumar, V.: Mining strong affinity association patterns in data sets with skewed support distribution. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 387–394. IEEE Computer Society Press, Los Alamitos (2003)CrossRefGoogle Scholar
  43. 43.
    Yao, J., Liu, H.: Searching multiple databases for interesting complexes. In: Lu, H., Motoda, H., Liu, H. (eds.) 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining. KDD: Techniques and Applications, pp. 198–210. World Scientific Publishing Company, Singapore (1997)Google Scholar
  44. 44.
    Yao, Y., Chen, Y., Yang, X.D.: A measurement-theoretic foundation of rule interestingness evaluation. In: Lin, T.Y., Ohsuga, S., Liau, C.J., Hu, X. (eds.) Foundations and Novel Approaches in Data Mining. SCI, vol. 9, pp. 41–59. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  45. 45.
    Yule, G.U.: On the association of attributes in statistics: With illustrations from the material of the childhood society. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 194, 257–319 (1900)zbMATHCrossRefGoogle Scholar
  46. 46.
    Zhang, T.: Association rules. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 245–256. Springer, Heidelberg (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yannick Le Bras
    • 1
    • 3
  • Philippe Lenca
    • 1
    • 3
  • Stéphane Lallich
    • 2
  1. 1.Institut Telecom; Telecom Bretagne; UMR CNRS 3192 Lab-STICC, Technopôle Brest-IroiseBrest Cedex 3France
  2. 2.Laboratoire ERICUniversité de LyonLyon 2France
  3. 3.Université européenne de BretagneFrance

Personalised recommendations