Scoring and Ranking the Data Using Association Rules

Liu, Bing; Ma, Yiming; Wong, Ching Kian

doi:10.1007/978-3-7908-1791-1_9

Bing Liu⁵,
Yiming Ma⁵ &
Ching Kian Wong⁵

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 95))

282 Accesses
2 Citations

Abstract

In many data mining applications, the objective is to find the likelihood that an object belongs to a particular class. For example, in direct marketing, marketers want to know how likely a potential customer will buy a particular product. In such applications, it is often too difficult to predict who will definitely be buyers and non-buyers because the data used for modeling is often very noisy and has a highly imbalanced class distribution. Traditionally, classification systems are used to solve this problem. Instead of assigning a definite class (e.g., buyer or non-buyer) to a data case representing a potential customer, a classification system is made to produce a class probability estimate (or a score) for the data case. However, existing classification systems only aim to find a small subset of rules that exist in data to form a classifier. This small subset of rules can only give a partial (or biased) picture of the domain. In this paper, we show that association rule mining provides a more powerful solution to the problem because association rule mining aims to generate all rules in data. It is thus able to give a complete picture of the underlying relationships that exist in the domain. This complete set of rules enables us to assign a more accurate class probability estimate (or likelihood) to each (new) data case. An efficient technique that makes use of the discovered association rules to produce class probability estimates is proposed. We call this technique scoring based on associations (or SBA). Experiment results on both public domain data and our real-life application data show that the technique performs significantly better than the state-of-the-art classification system C4.5.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C., and Yu, P. “Online Generation of Association Rules.” ICDE-98, pp. 402–411, 1998.
Google Scholar
Agrawal, R., Imielinski, T., Swami, A. “Mining association rules between sets of items in large databases.” SIGMOD-1993, 1993, pp. 207–216.
Google Scholar
Agrawal, R. and Srikant, R. “Fast algorithms for mining association rules.”VLDB-94
Google Scholar
Bayardo, R., Agrawal, R, and Gunopulos, D. “Constraint-based rule mining in large, dense databases.” ICDE-99, 1999.
Google Scholar
Brin, S. Motwani, R. Ullman, J. and Tsur, S. “Dynamic Itemset counting and implication rules for market basket data.” SIGMOD-97, 1997, pp. 255–264.
Google Scholar
Chan, P. K., and Stolfo, S. J. “Towards scaleable learning with non-uniform class and cost distributions: a case study in credit card fraud detection”, KDD-98, 1998.
Google Scholar
Cheung, D. W., Han, J, V. Ng, and Wong, C.Y. “Maintenance of discovered association rules in large databases: an incremental updating technique.” ICDE-96, 1996, pp. 106–114.
Google Scholar
Dong, G., Zhang, X., Wong, L. and Li, J. “CAEP: classification by aggregating emerging patterns.” DS-99: Second International Conference on Discovery Science, 1999.
Google Scholar
Fawcett, T., and Provost, F. “Combining data mining and machine learning for effective user profile.” KDD-96
Google Scholar
Fayyad, U. M. and Irani, K. B. “Multi-interval discretization of continuous-valued attributes for classification learning.” IJCAI-93, 1993, pp. 1022–1027.
Google Scholar
Gehrke, J., Ganti, V., Ramakrishnan, R. and Loh, W. “BOAT-optimistic decision tree construction.” SIGMOD-99
Google Scholar
Han, J. and Fu, Y. “Discovery of multiple-level association rules from large databases.” VLDB-95, 1995.
Google Scholar
Hughes, A. M. The complete database marketer: second-generation strategies and techniques for tapping the power of your customer database. Chicago, Ill.: Irwin Professional, 1996.
Google Scholar
Kubat, M. and Matwin, S. “Addressing the curse of imbalanced training sets.” ICML-1997
Google Scholar
Kohavi, R., John, G., Long, R., Manley, D., and Pfleger, K. “MLC++: a machine learning library in C++.” Tools with artificial intelligence, 1994, pp. 740–743.
Google Scholar
Ling, C. and Li C. “Data mining for direct marketing: problems and solutions.” KDD-98
Google Scholar
Liu, B., Hsu, W. and Ma, Y. “Integrating classification and association rule mining.” KDD-98, 1999.
Google Scholar
Liu, B., Hsu, W. and Ma, Y. “Mining association rules with multiple minimum supports.” KDD-99, 1999.
Google Scholar
Liu, B., Hsu, W. and Ma, Y. “Pruning and summarizing the discovered associations.” KDD-99, 1999.
Google Scholar
Mahta, M., Agrawal, R. and Rissanen, J. “SLIQ: A fast scalable classifier for data mining.” Proc. of the fifth Int’l Conference on Extending Database Technology, 1996.
Google Scholar
Mannila, H., Toivonen, H. and Verkamo, A. I. “Efficient algorithms for discovering association rules.” In KDD-94: AAAI workshop on knowledge discovery in databases, 1994.
Google Scholar
Meretakis, D. and Wuthrich, B. “Extending naïve bayes classifiers using long itemsets.” KDD-99, 1999.
Google Scholar
Merz, C. J, and Murphy, P. UCI repository of machine learning databases http://www.cs.uci.edu/~mlearn/MLRepository.html, 1996.
Google Scholar
Mills, F. Statistical Methods, Pitman, 1955.
Google Scholar
Ng. R. T. Lakshmanan, L. Han, J. “Exploratory mining and pruning optimizations of constrained association rules.” SIGMOD-98, 1998.
Google Scholar
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., and Brunk, C. “Reducing misclassification costs.” ICML-97, 1997.
Google Scholar
Quinlan, R. C4.5: program for machine learning. Morgan Kaufmann, 1992.
Google Scholar
Rastogi, R. and Shim, K. 1998. “PUBLIC: A decision tree classifier that integrates building and pruning” VLDB-98, 1998.
Google Scholar
Srikant, R. and Agrawal, R. “Mining generalized association rules.” VLDB1995, 1995.
Google Scholar
Toivonen, H. “Sampling large databases for association rules.” VLDB-96, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, National University of Singapore, 3 Since Drive 2, Singapore, 117543
Bing Liu, Yiming Ma & Ching Kian Wong

Authors

Bing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Ma
View author publications
You can also search for this author in PubMed Google Scholar
Ching Kian Wong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, San Jose State University The Metropolitan University of Silicon Valley, One Washington Square, 95192-0103, San Jose, CA, USA
Tsau Young Lin
Department of Computer Science, University of Regina, S4S 0A2, Regina, Saskatchewan, Canada
Yiyu Y. Yao
Computer Science Division and Electronics Research Laboratory Department of Electrical and Electronics, University of California Berkeley Initiative in Soft Computing (BISC), 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Liu, B., Ma, Y., Wong, C.K. (2002). Scoring and Ranking the Data Using Association Rules. In: Lin, T.Y., Yao, Y.Y., Zadeh, L.A. (eds) Data Mining, Rough Sets and Granular Computing. Studies in Fuzziness and Soft Computing, vol 95. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1791-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-7908-1791-1_9
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-2508-4
Online ISBN: 978-3-7908-1791-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics