PMCRI: A Parallel Modular Classification Rule Induction Framework

Stahl, Frederic; Bramer, Max; Adda, Mo

doi:10.1007/978-3-642-03070-3_12

Frederic Stahl²⁰,
Max Bramer²⁰ &
Mo Adda²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5632))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

2377 Accesses
11 Citations

Abstract

In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hunt, E.B., Marin, J., Stone, P.J.: Experiments in Induction. Academic Press, London (1966)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Google Scholar
Cendrowska, J.: PRISM: an Algorithm for Inducing Modular Rules. International Journal of Man-Machine Studies 27, 349–370 (1987)
Article MATH Google Scholar
Shu-Ching, C., Mei-Ling, S., Schengcui, Z.: Detection of Soccer Goal Shots Using Joint Multimedia Features and classification Rules. In: Fourth International Workshop on Multimedia Data Mining, Washington, DC, USA, pp. 36–44 (2003)
Google Scholar
Bramer, M.: An Information-Theoretic Approach to the Pre-pruning of Classification Rules. In: Proceedings of the IFIP Seventeenth World Computer Congress - TC12 Stream on Intelligent Information Processing, pp. 201–212. Kluwer, B.V., Dordrecht (2002)
Google Scholar
Bramer, M.: Automatic Induction of Classification Rules from Examples Using N-Prism. In: Research and Development in Intelligent Systems XVI (2000)
Google Scholar
Garner, S.: Weka: The Waikato Environment for Knowledge Analysis. In: New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)
Google Scholar
Bramer, M.: Inducer: a public domain workbench for data mining. International Journal of Systems Science 36(14), 909–919 (2005)
Article MATH Google Scholar
Metha, M., Agrawal, R., Rissanen, J.: SLIQ: A Fast Fcalable Classier for Data Mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057. Springer, Heidelberg (1996)
Google Scholar
Shafer, J.C., Agrawal, R., Mehta, M.: SPRINT: A Scalable Parallel Classifier for Data Mining. In: Twenty-second International Conference on Very Large Data Bases (1996)
Google Scholar
Catlett, J.: Megainduction: Machine learning on very large databases. University of Technology, Sydney (1991)
MATH Google Scholar
Frey, L.J., Fisher, D.H.: Modelling Decision Tree Performance with the Power Law. In: Evelenth International Workshop on Artificial Intelligence and Statistics (1999)
Google Scholar
Provost, F., Jensen, D., Oates, T.: Efficient Progressive Sampling. In: Geoffrey, I. (ed.) Knowledge Discovery and Data Mining, pp. 23–32 (1999)
Google Scholar
Chan, P.K., Stolfo, S.J.: Experiments on Multistrategy Learning by Meta Learning. In: Second International Conference on Information and Knowledge Management, pp. 314–323 (1993)
Google Scholar
Chan, P.K., Stolfo, S.J.: Meta-Learning for Multistrategy and Parallel Learning. In: Second International Workshop on Multistrategy Learning, pp. 150–165 (1993)
Google Scholar
Michalski, R.S.: On the quasi-minimal solution of the general covering problem. In: Proceedings of the Fifth International Symposium on Information Processing, Bled, Yugoslavia, pp. 125–128 (1969)
Google Scholar
Zaki, M.J., Ho, C.T., Agrawal, R.: Parallel Classification for Data Mining on Shared Memory Multiprocessors. In: Fifteenth International conference on Data Mining (1999)
Google Scholar
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. University of California, Irvine, Department of Information and Computer Sciences (1998)
Google Scholar
Provost, F.: Distributed Data Mining: Scaling up and Beyond. In: Kargupta, P.C.H. (ed.) Advances in Distributed and Parallel Knowledge Discovery. AAAI Press / The MIT Press (2000)
Google Scholar
Kamath, C., Musik, R.: Scalable Data Mining through Fine-Grained Parallelism. In: Kargupta, P.C.H. (ed.) Advances in Distributed and Parallel Knowledge Discovery. AAAI Press / The MIT Press (2000)
Google Scholar
Stahl, F., Bramer, M.: P-Prism: A Computationally Efficient Approach to Scaling up Classification Rule Induction. In: IFIP International Conference on Artificial Intelligence. Springer, Milan (2008)
Google Scholar
Nolle, L., Wong, K.C.P., Hopgood, A.: DARBS: A Distributed Blackboard System. In: Twenty-first SGES International Conference on Knowledge Based Systems (2001)
Google Scholar
Stahl, F., Bramer, M.: Parallel Induction of Modular Classification Rules. In: Twenty-eighth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence. Springer, Cambridge (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth, PO1 3HE, United Kingdom
Frederic Stahl, Max Bramer & Mo Adda

Authors

Frederic Stahl
View author publications
You can also search for this author in PubMed Google Scholar
Max Bramer
View author publications
You can also search for this author in PubMed Google Scholar
Mo Adda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Bildverarbeitung und angewandte Informatik, Körnerstr. 10, 04107, Leipzig, Deutschland, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stahl, F., Bramer, M., Adda, M. (2009). PMCRI: A Parallel Modular Classification Rule Induction Framework. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2009. Lecture Notes in Computer Science(), vol 5632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03070-3_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-03070-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03069-7
Online ISBN: 978-3-642-03070-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics