Skip to main content

Scoring and Ranking the Data Using Association Rules

  • Chapter
Data Mining, Rough Sets and Granular Computing

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 95))

Abstract

In many data mining applications, the objective is to find the likelihood that an object belongs to a particular class. For example, in direct marketing, marketers want to know how likely a potential customer will buy a particular product. In such applications, it is often too difficult to predict who will definitely be buyers and non-buyers because the data used for modeling is often very noisy and has a highly imbalanced class distribution. Traditionally, classification systems are used to solve this problem. Instead of assigning a definite class (e.g., buyer or non-buyer) to a data case representing a potential customer, a classification system is made to produce a class probability estimate (or a score) for the data case. However, existing classification systems only aim to find a small subset of rules that exist in data to form a classifier. This small subset of rules can only give a partial (or biased) picture of the domain. In this paper, we show that association rule mining provides a more powerful solution to the problem because association rule mining aims to generate all rules in data. It is thus able to give a complete picture of the underlying relationships that exist in the domain. This complete set of rules enables us to assign a more accurate class probability estimate (or likelihood) to each (new) data case. An efficient technique that makes use of the discovered association rules to produce class probability estimates is proposed. We call this technique scoring based on associations (or SBA). Experiment results on both public domain data and our real-life application data show that the technique performs significantly better than the state-of-the-art classification system C4.5.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C., and Yu, P. “Online Generation of Association Rules.” ICDE-98, pp. 402–411, 1998.

    Google Scholar 

  2. Agrawal, R., Imielinski, T., Swami, A. “Mining association rules between sets of items in large databases.” SIGMOD-1993, 1993, pp. 207–216.

    Google Scholar 

  3. Agrawal, R. and Srikant, R. “Fast algorithms for mining association rules.”VLDB-94

    Google Scholar 

  4. Bayardo, R., Agrawal, R, and Gunopulos, D. “Constraint-based rule mining in large, dense databases.” ICDE-99, 1999.

    Google Scholar 

  5. Brin, S. Motwani, R. Ullman, J. and Tsur, S. “Dynamic Itemset counting and implication rules for market basket data.” SIGMOD-97, 1997, pp. 255–264.

    Google Scholar 

  6. Chan, P. K., and Stolfo, S. J. “Towards scaleable learning with non-uniform class and cost distributions: a case study in credit card fraud detection”, KDD-98, 1998.

    Google Scholar 

  7. Cheung, D. W., Han, J, V. Ng, and Wong, C.Y. “Maintenance of discovered association rules in large databases: an incremental updating technique.” ICDE-96, 1996, pp. 106–114.

    Google Scholar 

  8. Dong, G., Zhang, X., Wong, L. and Li, J. “CAEP: classification by aggregating emerging patterns.” DS-99: Second International Conference on Discovery Science, 1999.

    Google Scholar 

  9. Fawcett, T., and Provost, F. “Combining data mining and machine learning for effective user profile.” KDD-96

    Google Scholar 

  10. Fayyad, U. M. and Irani, K. B. “Multi-interval discretization of continuous-valued attributes for classification learning.” IJCAI-93, 1993, pp. 1022–1027.

    Google Scholar 

  11. Gehrke, J., Ganti, V., Ramakrishnan, R. and Loh, W. “BOAT-optimistic decision tree construction.” SIGMOD-99

    Google Scholar 

  12. Han, J. and Fu, Y. “Discovery of multiple-level association rules from large databases.” VLDB-95, 1995.

    Google Scholar 

  13. Hughes, A. M. The complete database marketer: second-generation strategies and techniques for tapping the power of your customer database. Chicago, Ill.: Irwin Professional, 1996.

    Google Scholar 

  14. Kubat, M. and Matwin, S. “Addressing the curse of imbalanced training sets.” ICML-1997

    Google Scholar 

  15. Kohavi, R., John, G., Long, R., Manley, D., and Pfleger, K. “MLC++: a machine learning library in C++.” Tools with artificial intelligence, 1994, pp. 740–743.

    Google Scholar 

  16. Ling, C. and Li C. “Data mining for direct marketing: problems and solutions.” KDD-98

    Google Scholar 

  17. Liu, B., Hsu, W. and Ma, Y. “Integrating classification and association rule mining.” KDD-98, 1999.

    Google Scholar 

  18. Liu, B., Hsu, W. and Ma, Y. “Mining association rules with multiple minimum supports.” KDD-99, 1999.

    Google Scholar 

  19. Liu, B., Hsu, W. and Ma, Y. “Pruning and summarizing the discovered associations.” KDD-99, 1999.

    Google Scholar 

  20. Mahta, M., Agrawal, R. and Rissanen, J. “SLIQ: A fast scalable classifier for data mining.” Proc. of the fifth Int’l Conference on Extending Database Technology, 1996.

    Google Scholar 

  21. Mannila, H., Toivonen, H. and Verkamo, A. I. “Efficient algorithms for discovering association rules.” In KDD-94: AAAI workshop on knowledge discovery in databases, 1994.

    Google Scholar 

  22. Meretakis, D. and Wuthrich, B. “Extending naïve bayes classifiers using long itemsets.” KDD-99, 1999.

    Google Scholar 

  23. Merz, C. J, and Murphy, P. UCI repository of machine learning databases http://www.cs.uci.edu/~mlearn/MLRepository.html, 1996.

    Google Scholar 

  24. Mills, F. Statistical Methods, Pitman, 1955.

    Google Scholar 

  25. Ng. R. T. Lakshmanan, L. Han, J. “Exploratory mining and pruning optimizations of constrained association rules.” SIGMOD-98, 1998.

    Google Scholar 

  26. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., and Brunk, C. “Reducing misclassification costs.” ICML-97, 1997.

    Google Scholar 

  27. Quinlan, R. C4.5: program for machine learning. Morgan Kaufmann, 1992.

    Google Scholar 

  28. Rastogi, R. and Shim, K. 1998. “PUBLIC: A decision tree classifier that integrates building and pruning” VLDB-98, 1998.

    Google Scholar 

  29. Srikant, R. and Agrawal, R. “Mining generalized association rules.” VLDB1995, 1995.

    Google Scholar 

  30. Toivonen, H. “Sampling large databases for association rules.” VLDB-96, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Liu, B., Ma, Y., Wong, C.K. (2002). Scoring and Ranking the Data Using Association Rules. In: Lin, T.Y., Yao, Y.Y., Zadeh, L.A. (eds) Data Mining, Rough Sets and Granular Computing. Studies in Fuzziness and Soft Computing, vol 95. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1791-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-7908-1791-1_9

  • Publisher Name: Physica, Heidelberg

  • Print ISBN: 978-3-7908-2508-4

  • Online ISBN: 978-3-7908-1791-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics