Skip to main content

Mining Classification Rules from Datasets with Large Number of Many-Valued Attributes

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1777))

Abstract

Decision tree induction algorithms scale well to large datasets for their univariate and divide-and-conquer approach. However, they may fail in discovering effective knowledge when the input dataset consists of a large number of uncorrelated many-valued attributes. In this paper we present an algorithm, Noah, that tackles this problem by applying a multivariate search. Performing a multivariate search leads to a much larger consumption of computation time and memory, this may be prohibitive for large datasets. We remedy this problem by exploiting effective pruning strategies and efficient data structures. We applied our algorithm to a real marketing application of cross-selling. Experimental results revealed that the application database was too complex for C4.5 as it failed to discover any useful knowledge. The application database was also too large for various well known rule discovery algorithms which were not able to complete their task. The pruning techniques used in Noah are general in nature and can be used in other mining systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A.I. Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining. AAAI Press / The MIT Press, 1996.

    Google Scholar 

  2. R.J. Bayardo. Brute-force mining of high-confidence classification rules. In D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy, editors, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97). AAAI Press, 1997.

    Google Scholar 

  3. R.J. Bayardo, R. Agrawal, and D. Gunopulos. Constraint-based rule mining in large, dense databases. In Proc. of the 15th Int’l Conf. on Data Engineering, pages 188–197, 1999.

    Google Scholar 

  4. P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3:261–283, 1989.

    Google Scholar 

  5. W.W. Cohen. Learning trees and rules with set-valued features. In Proceedings of the Thirteenth National Conference on Artificial Intelligence AAAI-96. AAAI press/ The MIT press, August 1996.

    Google Scholar 

  6. L. G. Cooper and G. Giuffrida. Turning datamining into a management science tool: New algorithms and empirical results. Management Science, 2000 (To appear).

    Google Scholar 

  7. P. Domingos. Linear-time rule induction. In E. Simoudis, J. W. Han, and U. Fayyad, editors, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), page 96. AAAI Press, 1996.

    Google Scholar 

  8. G. Giuffrida, L. G. Cooper, and W. W. Chu. A scalable bottom-up data mining algorithm for relational databases. In 10th International Conference on Scientific and Statistical Database Management (SSDBM’ 98), Capri, Italy, July 1998. IEEE Publisher.

    Google Scholar 

  9. R.C. Holte, L.E. Acker, and B.W. Porter. Concept learning and the problem of small disjuncts. In Proceedings of the Eleventh International Joint Conference on Arti_cial Intelligence, Detroit, (MI), 1989. Morgan Kaufmann.

    Google Scholar 

  10. B. Lent, A. Swami, and J. Widom. Clustering association rules. In Proceedings of the Thirteenth International Conference on Data Engineering (ICDE’ 97), Birmingham, UK, 1997.

    Google Scholar 

  11. B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In R. Agrawal, P. Storloz, and G. Piatetsky-Shapiro, editors, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), page 80. AAAI Press, 1998.

    Google Scholar 

  12. H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1, November 1997.

    Google Scholar 

  13. M. Mehta, R. Agrawal, and J. Rissanen. SLIQ: A fast scalable classifier for data mining. Lecture Notes in Computer Science, 1057, 1996.

    Google Scholar 

  14. Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, California, 1988.

    Google Scholar 

  15. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, California, 1993.

    Google Scholar 

  16. J. C. Shafer, R. Agrawal, and M. Mehta. SPRINT: A scalable parallel classifier for data mining. In T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, and Nandlal L. Sarda, editors, VLDB 1996, Mumbai (Bombay), India, September 1996. Morgan Kaufmann.

    Google Scholar 

  17. M. Wang, B. Iyer, and J. S. Vitter. Scalable mining for classification rules in relational databases. In Proceedings of International Database Engineering and Application Symposium (IDEAS’98), Cardiff, Wales, U.K., July 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Giuffrida, G., Chu, W.W., Hanssens, D.M. (2000). Mining Classification Rules from Datasets with Large Number of Many-Valued Attributes. In: Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T. (eds) Advances in Database Technology — EDBT 2000. EDBT 2000. Lecture Notes in Computer Science, vol 1777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46439-5_23

Download citation

  • DOI: https://doi.org/10.1007/3-540-46439-5_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67227-2

  • Online ISBN: 978-3-540-46439-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics