Abstract
In many data mining applications, the objective is to find the likelihood that an object belongs to a particular class. For example, in direct marketing, marketers want to know how likely a potential customer will buy a particular product. In such applications, it is often too difficult to predict who will definitely be buyers and non-buyers because the data used for modeling is often very noisy and has a highly imbalanced class distribution. Traditionally, classification systems are used to solve this problem. Instead of assigning a definite class (e.g., buyer or non-buyer) to a data case representing a potential customer, a classification system is made to produce a class probability estimate (or a score) for the data case. However, existing classification systems only aim to find a small subset of rules that exist in data to form a classifier. This small subset of rules can only give a partial (or biased) picture of the domain. In this paper, we show that association rule mining provides a more powerful solution to the problem because association rule mining aims to generate all rules in data. It is thus able to give a complete picture of the underlying relationships that exist in the domain. This complete set of rules enables us to assign a more accurate class probability estimate (or likelihood) to each (new) data case. An efficient technique that makes use of the discovered association rules to produce class probability estimates is proposed. We call this technique scoring based on associations (or SBA). Experiment results on both public domain data and our real-life application data show that the technique performs significantly better than the state-of-the-art classification system C4.5.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C., and Yu, P. “Online Generation of Association Rules.” ICDE-98, pp. 402–411, 1998.
Agrawal, R., Imielinski, T., Swami, A. “Mining association rules between sets of items in large databases.” SIGMOD-1993, 1993, pp. 207–216.
Agrawal, R. and Srikant, R. “Fast algorithms for mining association rules.”VLDB-94
Bayardo, R., Agrawal, R, and Gunopulos, D. “Constraint-based rule mining in large, dense databases.” ICDE-99, 1999.
Brin, S. Motwani, R. Ullman, J. and Tsur, S. “Dynamic Itemset counting and implication rules for market basket data.” SIGMOD-97, 1997, pp. 255–264.
Chan, P. K., and Stolfo, S. J. “Towards scaleable learning with non-uniform class and cost distributions: a case study in credit card fraud detection”, KDD-98, 1998.
Cheung, D. W., Han, J, V. Ng, and Wong, C.Y. “Maintenance of discovered association rules in large databases: an incremental updating technique.” ICDE-96, 1996, pp. 106–114.
Dong, G., Zhang, X., Wong, L. and Li, J. “CAEP: classification by aggregating emerging patterns.” DS-99: Second International Conference on Discovery Science, 1999.
Fawcett, T., and Provost, F. “Combining data mining and machine learning for effective user profile.” KDD-96
Fayyad, U. M. and Irani, K. B. “Multi-interval discretization of continuous-valued attributes for classification learning.” IJCAI-93, 1993, pp. 1022–1027.
Gehrke, J., Ganti, V., Ramakrishnan, R. and Loh, W. “BOAT-optimistic decision tree construction.” SIGMOD-99
Han, J. and Fu, Y. “Discovery of multiple-level association rules from large databases.” VLDB-95, 1995.
Hughes, A. M. The complete database marketer: second-generation strategies and techniques for tapping the power of your customer database. Chicago, Ill.: Irwin Professional, 1996.
Kubat, M. and Matwin, S. “Addressing the curse of imbalanced training sets.” ICML-1997
Kohavi, R., John, G., Long, R., Manley, D., and Pfleger, K. “MLC++: a machine learning library in C++.” Tools with artificial intelligence, 1994, pp. 740–743.
Ling, C. and Li C. “Data mining for direct marketing: problems and solutions.” KDD-98
Liu, B., Hsu, W. and Ma, Y. “Integrating classification and association rule mining.” KDD-98, 1999.
Liu, B., Hsu, W. and Ma, Y. “Mining association rules with multiple minimum supports.” KDD-99, 1999.
Liu, B., Hsu, W. and Ma, Y. “Pruning and summarizing the discovered associations.” KDD-99, 1999.
Mahta, M., Agrawal, R. and Rissanen, J. “SLIQ: A fast scalable classifier for data mining.” Proc. of the fifth Int’l Conference on Extending Database Technology, 1996.
Mannila, H., Toivonen, H. and Verkamo, A. I. “Efficient algorithms for discovering association rules.” In KDD-94: AAAI workshop on knowledge discovery in databases, 1994.
Meretakis, D. and Wuthrich, B. “Extending naïve bayes classifiers using long itemsets.” KDD-99, 1999.
Merz, C. J, and Murphy, P. UCI repository of machine learning databases http://www.cs.uci.edu/~mlearn/MLRepository.html, 1996.
Mills, F. Statistical Methods, Pitman, 1955.
Ng. R. T. Lakshmanan, L. Han, J. “Exploratory mining and pruning optimizations of constrained association rules.” SIGMOD-98, 1998.
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., and Brunk, C. “Reducing misclassification costs.” ICML-97, 1997.
Quinlan, R. C4.5: program for machine learning. Morgan Kaufmann, 1992.
Rastogi, R. and Shim, K. 1998. “PUBLIC: A decision tree classifier that integrates building and pruning” VLDB-98, 1998.
Srikant, R. and Agrawal, R. “Mining generalized association rules.” VLDB1995, 1995.
Toivonen, H. “Sampling large databases for association rules.” VLDB-96, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Liu, B., Ma, Y., Wong, C.K. (2002). Scoring and Ranking the Data Using Association Rules. In: Lin, T.Y., Yao, Y.Y., Zadeh, L.A. (eds) Data Mining, Rough Sets and Granular Computing. Studies in Fuzziness and Soft Computing, vol 95. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1791-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-7908-1791-1_9
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-2508-4
Online ISBN: 978-3-7908-1791-1
eBook Packages: Springer Book Archive