Scoring and Ranking the Data Using Association Rules

  • Bing Liu
  • Yiming Ma
  • Ching Kian Wong
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 95)


In many data mining applications, the objective is to find the likelihood that an object belongs to a particular class. For example, in direct marketing, marketers want to know how likely a potential customer will buy a particular product. In such applications, it is often too difficult to predict who will definitely be buyers and non-buyers because the data used for modeling is often very noisy and has a highly imbalanced class distribution. Traditionally, classification systems are used to solve this problem. Instead of assigning a definite class (e.g., buyer or non-buyer) to a data case representing a potential customer, a classification system is made to produce a class probability estimate (or a score) for the data case. However, existing classification systems only aim to find a small subset of rules that exist in data to form a classifier. This small subset of rules can only give a partial (or biased) picture of the domain. In this paper, we show that association rule mining provides a more powerful solution to the problem because association rule mining aims to generate all rules in data. It is thus able to give a complete picture of the underlying relationships that exist in the domain. This complete set of rules enables us to assign a more accurate class probability estimate (or likelihood) to each (new) data case. An efficient technique that makes use of the discovered association rules to produce class probability estimates is proposed. We call this technique scoring based on associations (or SBA). Experiment results on both public domain data and our real-life application data show that the technique performs significantly better than the state-of-the-art classification system C4.5.


Association Rule Minimum Support Data Case Association Rule Mining Minority Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal, C., and Yu, P. “Online Generation of Association Rules.” ICDE-98, pp. 402–411, 1998.Google Scholar
  2. 2.
    Agrawal, R., Imielinski, T., Swami, A. “Mining association rules between sets of items in large databases.” SIGMOD-1993, 1993, pp. 207–216.Google Scholar
  3. 3.
    Agrawal, R. and Srikant, R. “Fast algorithms for mining association rules.”VLDB-94 Google Scholar
  4. 4.
    Bayardo, R., Agrawal, R, and Gunopulos, D. “Constraint-based rule mining in large, dense databases.” ICDE-99, 1999.Google Scholar
  5. 5.
    Brin, S. Motwani, R. Ullman, J. and Tsur, S. “Dynamic Itemset counting and implication rules for market basket data.” SIGMOD-97, 1997, pp. 255–264.Google Scholar
  6. 6.
    Chan, P. K., and Stolfo, S. J. “Towards scaleable learning with non-uniform class and cost distributions: a case study in credit card fraud detection”, KDD-98, 1998.Google Scholar
  7. 7.
    Cheung, D. W., Han, J, V. Ng, and Wong, C.Y. “Maintenance of discovered association rules in large databases: an incremental updating technique.” ICDE-96, 1996, pp. 106–114.Google Scholar
  8. 8.
    Dong, G., Zhang, X., Wong, L. and Li, J. “CAEP: classification by aggregating emerging patterns.” DS-99: Second International Conference on Discovery Science, 1999.Google Scholar
  9. 9.
    Fawcett, T., and Provost, F. “Combining data mining and machine learning for effective user profile.” KDD-96 Google Scholar
  10. 10.
    Fayyad, U. M. and Irani, K. B. “Multi-interval discretization of continuous-valued attributes for classification learning.” IJCAI-93, 1993, pp. 1022–1027.Google Scholar
  11. 11.
    Gehrke, J., Ganti, V., Ramakrishnan, R. and Loh, W. “BOAT-optimistic decision tree construction.” SIGMOD-99 Google Scholar
  12. 12.
    Han, J. and Fu, Y. “Discovery of multiple-level association rules from large databases.” VLDB-95, 1995.Google Scholar
  13. 13.
    Hughes, A. M. The complete database marketer: second-generation strategies and techniques for tapping the power of your customer database. Chicago, Ill.: Irwin Professional, 1996.Google Scholar
  14. 14.
    Kubat, M. and Matwin, S. “Addressing the curse of imbalanced training sets.” ICML-1997 Google Scholar
  15. 15.
    Kohavi, R., John, G., Long, R., Manley, D., and Pfleger, K. “MLC++: a machine learning library in C++.” Tools with artificial intelligence, 1994, pp. 740–743.Google Scholar
  16. 16.
    Ling, C. and Li C. “Data mining for direct marketing: problems and solutions.” KDD-98 Google Scholar
  17. 17.
    Liu, B., Hsu, W. and Ma, Y. “Integrating classification and association rule mining.” KDD-98, 1999.Google Scholar
  18. 18.
    Liu, B., Hsu, W. and Ma, Y. “Mining association rules with multiple minimum supports.” KDD-99, 1999.Google Scholar
  19. 19.
    Liu, B., Hsu, W. and Ma, Y. “Pruning and summarizing the discovered associations.” KDD-99, 1999.Google Scholar
  20. 20.
    Mahta, M., Agrawal, R. and Rissanen, J. “SLIQ: A fast scalable classifier for data mining.” Proc. of the fifth Int’l Conference on Extending Database Technology, 1996.Google Scholar
  21. 21.
    Mannila, H., Toivonen, H. and Verkamo, A. I. “Efficient algorithms for discovering association rules.” In KDD-94: AAAI workshop on knowledge discovery in databases, 1994.Google Scholar
  22. 22.
    Meretakis, D. and Wuthrich, B. “Extending naïve bayes classifiers using long itemsets.” KDD-99, 1999.Google Scholar
  23. 23.
    Merz, C. J, and Murphy, P. UCI repository of machine learning databases, 1996.Google Scholar
  24. 24.
    Mills, F. Statistical Methods, Pitman, 1955.Google Scholar
  25. 25.
    Ng. R. T. Lakshmanan, L. Han, J. “Exploratory mining and pruning optimizations of constrained association rules.” SIGMOD-98, 1998.Google Scholar
  26. 26.
    Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., and Brunk, C. “Reducing misclassification costs.” ICML-97, 1997.Google Scholar
  27. 27.
    Quinlan, R. C4.5: program for machine learning. Morgan Kaufmann, 1992.Google Scholar
  28. 28.
    Rastogi, R. and Shim, K. 1998. “PUBLIC: A decision tree classifier that integrates building and pruning” VLDB-98, 1998.Google Scholar
  29. 29.
    Srikant, R. and Agrawal, R. “Mining generalized association rules.” VLDB1995, 1995.Google Scholar
  30. 30.
    Toivonen, H. “Sampling large databases for association rules.” VLDB-96, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Bing Liu
    • 1
  • Yiming Ma
    • 1
  • Ching Kian Wong
    • 1
  1. 1.School of ComputingNational University of SingaporeSingapore

Personalised recommendations