Skip to main content

PMCRI: A Parallel Modular Classification Rule Induction Framework

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5632))

Abstract

In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hunt, E.B., Marin, J., Stone, P.J.: Experiments in Induction. Academic Press, London (1966)

    Google Scholar 

  2. Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)

    Google Scholar 

  3. Cendrowska, J.: PRISM: an Algorithm for Inducing Modular Rules. International Journal of Man-Machine Studies 27, 349–370 (1987)

    Article  MATH  Google Scholar 

  4. Shu-Ching, C., Mei-Ling, S., Schengcui, Z.: Detection of Soccer Goal Shots Using Joint Multimedia Features and classification Rules. In: Fourth International Workshop on Multimedia Data Mining, Washington, DC, USA, pp. 36–44 (2003)

    Google Scholar 

  5. Bramer, M.: An Information-Theoretic Approach to the Pre-pruning of Classification Rules. In: Proceedings of the IFIP Seventeenth World Computer Congress - TC12 Stream on Intelligent Information Processing, pp. 201–212. Kluwer, B.V., Dordrecht (2002)

    Google Scholar 

  6. Bramer, M.: Automatic Induction of Classification Rules from Examples Using N-Prism. In: Research and Development in Intelligent Systems XVI (2000)

    Google Scholar 

  7. Garner, S.: Weka: The Waikato Environment for Knowledge Analysis. In: New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)

    Google Scholar 

  8. Bramer, M.: Inducer: a public domain workbench for data mining. International Journal of Systems Science 36(14), 909–919 (2005)

    Article  MATH  Google Scholar 

  9. Metha, M., Agrawal, R., Rissanen, J.: SLIQ: A Fast Fcalable Classier for Data Mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057. Springer, Heidelberg (1996)

    Google Scholar 

  10. Shafer, J.C., Agrawal, R., Mehta, M.: SPRINT: A Scalable Parallel Classifier for Data Mining. In: Twenty-second International Conference on Very Large Data Bases (1996)

    Google Scholar 

  11. Catlett, J.: Megainduction: Machine learning on very large databases. University of Technology, Sydney (1991)

    MATH  Google Scholar 

  12. Frey, L.J., Fisher, D.H.: Modelling Decision Tree Performance with the Power Law. In: Evelenth International Workshop on Artificial Intelligence and Statistics (1999)

    Google Scholar 

  13. Provost, F., Jensen, D., Oates, T.: Efficient Progressive Sampling. In: Geoffrey, I. (ed.) Knowledge Discovery and Data Mining, pp. 23–32 (1999)

    Google Scholar 

  14. Chan, P.K., Stolfo, S.J.: Experiments on Multistrategy Learning by Meta Learning. In: Second International Conference on Information and Knowledge Management, pp. 314–323 (1993)

    Google Scholar 

  15. Chan, P.K., Stolfo, S.J.: Meta-Learning for Multistrategy and Parallel Learning. In: Second International Workshop on Multistrategy Learning, pp. 150–165 (1993)

    Google Scholar 

  16. Michalski, R.S.: On the quasi-minimal solution of the general covering problem. In: Proceedings of the Fifth International Symposium on Information Processing, Bled, Yugoslavia, pp. 125–128 (1969)

    Google Scholar 

  17. Zaki, M.J., Ho, C.T., Agrawal, R.: Parallel Classification for Data Mining on Shared Memory Multiprocessors. In: Fifteenth International conference on Data Mining (1999)

    Google Scholar 

  18. Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. University of California, Irvine, Department of Information and Computer Sciences (1998)

    Google Scholar 

  19. Provost, F.: Distributed Data Mining: Scaling up and Beyond. In: Kargupta, P.C.H. (ed.) Advances in Distributed and Parallel Knowledge Discovery. AAAI Press / The MIT Press (2000)

    Google Scholar 

  20. Kamath, C., Musik, R.: Scalable Data Mining through Fine-Grained Parallelism. In: Kargupta, P.C.H. (ed.) Advances in Distributed and Parallel Knowledge Discovery. AAAI Press / The MIT Press (2000)

    Google Scholar 

  21. Stahl, F., Bramer, M.: P-Prism: A Computationally Efficient Approach to Scaling up Classification Rule Induction. In: IFIP International Conference on Artificial Intelligence. Springer, Milan (2008)

    Google Scholar 

  22. Nolle, L., Wong, K.C.P., Hopgood, A.: DARBS: A Distributed Blackboard System. In: Twenty-first SGES International Conference on Knowledge Based Systems (2001)

    Google Scholar 

  23. Stahl, F., Bramer, M.: Parallel Induction of Modular Classification Rules. In: Twenty-eighth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence. Springer, Cambridge (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stahl, F., Bramer, M., Adda, M. (2009). PMCRI: A Parallel Modular Classification Rule Induction Framework. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2009. Lecture Notes in Computer Science(), vol 5632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03070-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03070-3_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03069-7

  • Online ISBN: 978-3-642-03070-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics