Parallel Random Prism: A Computationally Efficient Ensemble Learner for Classification

Stahl, Frederic; May, David; Bramer, Max

doi:10.1007/978-1-4471-4739-8_2

Frederic Stahl³,
David May⁴ &
Max Bramer⁴

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

895 Accesses
1 Citations

Abstract

Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier’s classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hadoop, http://hadoop.apache.org/mapreduce/ 2011.
Jaume Bacardit and Natalio Krasnogor. The infobiotics PSP benchmarks repository. Technical report, 2008.
Google Scholar
Justin D. Basilico, M. Arthur Munson, Tamara G. Kolda, Kevin R. Dixon, and W. Philip Kegelmeyer. Comet: A recipe for learning and using large ensembles on massive data. CoRR, abs/1103.2068, 2011.
Google Scholar
C L Blake and C J Merz. UCI repository of machine learning databases. Technical report, University of California, Irvine, Department of Information and Computer Sciences, 1998.
Google Scholar
M A Bramer. Automatic induction of classification rules from examples using N-Prism. In Research and Development in Intelligent Systems XVI, pages 99–121, Cambridge, 2000. Springer-Verlag.
Google Scholar
M A Bramer. An information-theoretic approach to the pre-pruning of classification rules. In B Neumann M Musen and R Studer, editors, Intelligent Information Processing, pages 201– 212. Kluwer, 2002.
Google Scholar
M A Bramer. Inducer: a public domain workbench for data mining. International Journal of Systems Science, 36(14):909–919, 2005.
Google Scholar
Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
MathSciNet MATH Google Scholar
Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
Article MATH Google Scholar
J. Cendrowska. PRISM: an algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4):349–370, 1987.
Article MATH Google Scholar
Philip Chan and Salvatore J Stolfo. Experiments on multistrategy learning by meta learning. In Proc. Second Intl. Conference on Information and Knowledge Management, pages 314–323, 1993.
Google Scholar
Philip Chan and Salvatore J Stolfo. Meta-Learning for multi strategy and parallel learning. In Proceedings. Second International Workshop on Multistrategy Learning, pages 150–165, 1993.
Google Scholar
B.V. Dasarathy and B.V. Sheela. A composite classifier system design: Concepts and methodology. Proceedings of the IEEE, 67(5):708–713, 1979.
Article Google Scholar
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107–113, January 2008.
Google Scholar
Pedro Domingos and Geoff Hulten. Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’00, pages 71–80, New York, NY, USA, 2000. ACM.
Google Scholar
J Fuernkranz. Integrative windowing. Journal of Artificial Intelligence Resarch, 8:129–164, 1998.
MATH Google Scholar
John L Hennessy and David A Patterson. Computer Architecture A Quantitative Approach Morgan Kaufmann, USA, third edition, 2003.
Google Scholar
Tin Kam Ho. Random decision forests. Document Analysis and Recognition, International Conference on, 1:278, 1995.
Google Scholar
Nan-Chen Hsieh and Lun-Ping Hung. A data driven ensemble classifier for credit scoring analysis. Expert Systems with Applications, 37(1):534 – 545, 2010.
Article Google Scholar
Kai Hwang and Fay A Briggs. Computer Architecture and Parallel Processing. McGraw-Hill Book Co., international edition, 1987.
Google Scholar
Biswanath Panda, Joshua S. Herbach, Sugato Basu, and Roberto J. Bayardo. Planet: massively parallel learning of tree ensembles with mapreduce. Proc. VLDB Endow., 2:1426–1437, August 2009.
Google Scholar
Ross J Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.
Google Scholar
Ross J Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, 1993.
Google Scholar
Lior Rokach. Ensemble-based classifiers. Artificial Intelligence Review, 33:1–39, 2010.
Article Google Scholar
F. Stahl, M.M. Gaber, M. Bramer, and P.S. Yu. Pocket data mining: Towards collaborative data mining in mobile computing environments. In 22nd IEEE International Conference on Tools with Artificial Intelligence (ICTAI), volume 2, pages 323 –330, October 2010.
Google Scholar
Frederic Stahl and Max Bramer. Random Prism: An alternative to random forests. In Thirtyfirst SGAI International Conference on Artificial Intelligence, pages 5–18, Cambridge, England, 2011.
Google Scholar
Frederic Stahl, Mohamed Gaber, Paul Aldridge, David May, Han Liu, Max Bramer, and Philip Yu. Homogeneous and heterogeneous distributed classification for pocket data mining. In Transactions on Large-Scale Data- and Knowledge-Centered Systems V, volume 7100 of Lecture Notes in Computer Science, pages 183–205. Springer Berlin / Heidelberg, 2012.
Google Scholar
Ian HWitten and Frank Eibe. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, second edition, 2005.
Google Scholar
Gongqing Wu, Haiguang Li, Xuegang Hu, Yuanjun Bi, Jing Zhang, and Xindong Wu. Mrec4.5: C4.5 ensemble classification with mapreduce. In ChinaGrid Annual Conference, 2009. ChinaGrid ’09. Fourth, pages 249 –255, 2009.
Google Scholar
Jiang Wu, Meng-Long Li, Le-Zheng Yu, and Chao Wang. An ensemble classifier of support vector machines used to predict protein structural classes by fusing auto covariance and pseudo-amino acid composition. The Protein Journal, 29:62–67, 2010.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Design, Engineering & Computing, Bournemouth University, Poole House, Talbot Campus, Poole, BH12 5BB, USA
Frederic Stahl
School of Computing, Buckingham Building, University of Portsmouth, Lion Terrace, PO1 3HE, USA
David May & Max Bramer

Authors

Frederic Stahl
View author publications
You can also search for this author in PubMed Google Scholar
David May
View author publications
You can also search for this author in PubMed Google Scholar
Max Bramer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frederic Stahl .

Editor information

Editors and Affiliations

School of Computing, University of Portsmouth, Whitepost Lane The Lilacs, Portsmouth, PO1 3AH, Hampshire, United Kingdom
Max Bramer
School of Computing, Engineering & Mathe, University of Brighton, Lewes Road, Brighton, BN2 4GJ, West Sussex, United Kingdom
Miltos Petridis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stahl, F., May, D., Bramer, M. (2012). Parallel Random Prism: A Computationally Efficient Ensemble Learner for Classification. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_2

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4739-8_2
Published: 09 October 2012
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4738-1
Online ISBN: 978-1-4471-4739-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics