Mining Classification Rules from Datasets with Large Number of Many-Valued Attributes

Giuffrida, Giovanni; Chu, Wesley W.; Hanssens, Dominique M.

doi:10.1007/3-540-46439-5_23

Mining Classification Rules from Datasets with Large Number of Many-Valued Attributes

Giovanni Giuffrida⁷,
Wesley W. Chu⁷ &
Dominique M. Hanssens⁸

Conference paper
First Online: 01 January 2000

650 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1777))

Abstract

Decision tree induction algorithms scale well to large datasets for their univariate and divide-and-conquer approach. However, they may fail in discovering effective knowledge when the input dataset consists of a large number of uncorrelated many-valued attributes. In this paper we present an algorithm, Noah, that tackles this problem by applying a multivariate search. Performing a multivariate search leads to a much larger consumption of computation time and memory, this may be prohibitive for large datasets. We remedy this problem by exploiting effective pruning strategies and efficient data structures. We applied our algorithm to a real marketing application of cross-selling. Experimental results revealed that the application database was too complex for C4.5 as it failed to discover any useful knowledge. The application database was also too large for various well known rule discovery algorithms which were not able to complete their task. The pruning techniques used in Noah are general in nature and can be used in other mining systems.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A.I. Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining. AAAI Press / The MIT Press, 1996.
Google Scholar
R.J. Bayardo. Brute-force mining of high-confidence classification rules. In D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy, editors, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97). AAAI Press, 1997.
Google Scholar
R.J. Bayardo, R. Agrawal, and D. Gunopulos. Constraint-based rule mining in large, dense databases. In Proc. of the 15th Int’l Conf. on Data Engineering, pages 188–197, 1999.
Google Scholar
P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3:261–283, 1989.
Google Scholar
W.W. Cohen. Learning trees and rules with set-valued features. In Proceedings of the Thirteenth National Conference on Artificial Intelligence AAAI-96. AAAI press/ The MIT press, August 1996.
Google Scholar
L. G. Cooper and G. Giuffrida. Turning datamining into a management science tool: New algorithms and empirical results. Management Science, 2000 (To appear).
Google Scholar
P. Domingos. Linear-time rule induction. In E. Simoudis, J. W. Han, and U. Fayyad, editors, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), page 96. AAAI Press, 1996.
Google Scholar
G. Giuffrida, L. G. Cooper, and W. W. Chu. A scalable bottom-up data mining algorithm for relational databases. In 10th International Conference on Scientific and Statistical Database Management (SSDBM’ 98), Capri, Italy, July 1998. IEEE Publisher.
Google Scholar
R.C. Holte, L.E. Acker, and B.W. Porter. Concept learning and the problem of small disjuncts. In Proceedings of the Eleventh International Joint Conference on Arti_cial Intelligence, Detroit, (MI), 1989. Morgan Kaufmann.
Google Scholar
B. Lent, A. Swami, and J. Widom. Clustering association rules. In Proceedings of the Thirteenth International Conference on Data Engineering (ICDE’ 97), Birmingham, UK, 1997.
Google Scholar
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In R. Agrawal, P. Storloz, and G. Piatetsky-Shapiro, editors, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), page 80. AAAI Press, 1998.
Google Scholar
H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1, November 1997.
Google Scholar
M. Mehta, R. Agrawal, and J. Rissanen. SLIQ: A fast scalable classifier for data mining. Lecture Notes in Computer Science, 1057, 1996.
Google Scholar
Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, California, 1988.
Google Scholar
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, California, 1993.
Google Scholar
J. C. Shafer, R. Agrawal, and M. Mehta. SPRINT: A scalable parallel classifier for data mining. In T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, and Nandlal L. Sarda, editors, VLDB 1996, Mumbai (Bombay), India, September 1996. Morgan Kaufmann.
Google Scholar
M. Wang, B. Iyer, and J. S. Vitter. Scalable mining for classification rules in relational databases. In Proceedings of International Database Engineering and Application Symposium (IDEAS’98), Cardiff, Wales, U.K., July 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Univ. of California, Los Angeles
Giovanni Giuffrida & Wesley W. Chu
Anderson Grad. School of Manag., Univ. of California, Los Angeles
Dominique M. Hanssens

Authors

Giovanni Giuffrida
View author publications
You can also search for this author in PubMed Google Scholar
Wesley W. Chu
View author publications
You can also search for this author in PubMed Google Scholar
Dominique M. Hanssens
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of California, Los Angeles, CA, 90095, USA
Carlo Zaniolo
Computer Science Department, University of Karlsruhe, P.O. Box 6980, 76128, Karlsruhe, Germany
Peter C. Lockemann
University of Konstanz, P.O. Box D188, 78457, Konstanz, Germany
Marc H. Scholl & Torsten Grust &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Giuffrida, G., Chu, W.W., Hanssens, D.M. (2000). Mining Classification Rules from Datasets with Large Number of Many-Valued Attributes. In: Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T. (eds) Advances in Database Technology — EDBT 2000. EDBT 2000. Lecture Notes in Computer Science, vol 1777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46439-5_23

Download citation

DOI: https://doi.org/10.1007/3-540-46439-5_23
Published: 24 March 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67227-2
Online ISBN: 978-3-540-46439-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics