Abstract
Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier’s classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hadoop, http://hadoop.apache.org/mapreduce/ 2011.
Jaume Bacardit and Natalio Krasnogor. The infobiotics PSP benchmarks repository. Technical report, 2008.
Justin D. Basilico, M. Arthur Munson, Tamara G. Kolda, Kevin R. Dixon, and W. Philip Kegelmeyer. Comet: A recipe for learning and using large ensembles on massive data. CoRR, abs/1103.2068, 2011.
C L Blake and C J Merz. UCI repository of machine learning databases. Technical report, University of California, Irvine, Department of Information and Computer Sciences, 1998.
M A Bramer. Automatic induction of classification rules from examples using N-Prism. In Research and Development in Intelligent Systems XVI, pages 99–121, Cambridge, 2000. Springer-Verlag.
M A Bramer. An information-theoretic approach to the pre-pruning of classification rules. In B Neumann M Musen and R Studer, editors, Intelligent Information Processing, pages 201– 212. Kluwer, 2002.
M A Bramer. Inducer: a public domain workbench for data mining. International Journal of Systems Science, 36(14):909–919, 2005.
Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
J. Cendrowska. PRISM: an algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4):349–370, 1987.
Philip Chan and Salvatore J Stolfo. Experiments on multistrategy learning by meta learning. In Proc. Second Intl. Conference on Information and Knowledge Management, pages 314–323, 1993.
Philip Chan and Salvatore J Stolfo. Meta-Learning for multi strategy and parallel learning. In Proceedings. Second International Workshop on Multistrategy Learning, pages 150–165, 1993.
B.V. Dasarathy and B.V. Sheela. A composite classifier system design: Concepts and methodology. Proceedings of the IEEE, 67(5):708–713, 1979.
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107–113, January 2008.
Pedro Domingos and Geoff Hulten. Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’00, pages 71–80, New York, NY, USA, 2000. ACM.
J Fuernkranz. Integrative windowing. Journal of Artificial Intelligence Resarch, 8:129–164, 1998.
John L Hennessy and David A Patterson. Computer Architecture A Quantitative Approach Morgan Kaufmann, USA, third edition, 2003.
Tin Kam Ho. Random decision forests. Document Analysis and Recognition, International Conference on, 1:278, 1995.
Nan-Chen Hsieh and Lun-Ping Hung. A data driven ensemble classifier for credit scoring analysis. Expert Systems with Applications, 37(1):534 – 545, 2010.
Kai Hwang and Fay A Briggs. Computer Architecture and Parallel Processing. McGraw-Hill Book Co., international edition, 1987.
Biswanath Panda, Joshua S. Herbach, Sugato Basu, and Roberto J. Bayardo. Planet: massively parallel learning of tree ensembles with mapreduce. Proc. VLDB Endow., 2:1426–1437, August 2009.
Ross J Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.
Ross J Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, 1993.
Lior Rokach. Ensemble-based classifiers. Artificial Intelligence Review, 33:1–39, 2010.
F. Stahl, M.M. Gaber, M. Bramer, and P.S. Yu. Pocket data mining: Towards collaborative data mining in mobile computing environments. In 22nd IEEE International Conference on Tools with Artificial Intelligence (ICTAI), volume 2, pages 323 –330, October 2010.
Frederic Stahl and Max Bramer. Random Prism: An alternative to random forests. In Thirtyfirst SGAI International Conference on Artificial Intelligence, pages 5–18, Cambridge, England, 2011.
Frederic Stahl, Mohamed Gaber, Paul Aldridge, David May, Han Liu, Max Bramer, and Philip Yu. Homogeneous and heterogeneous distributed classification for pocket data mining. In Transactions on Large-Scale Data- and Knowledge-Centered Systems V, volume 7100 of Lecture Notes in Computer Science, pages 183–205. Springer Berlin / Heidelberg, 2012.
Ian HWitten and Frank Eibe. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, second edition, 2005.
Gongqing Wu, Haiguang Li, Xuegang Hu, Yuanjun Bi, Jing Zhang, and Xindong Wu. Mrec4.5: C4.5 ensemble classification with mapreduce. In ChinaGrid Annual Conference, 2009. ChinaGrid ’09. Fourth, pages 249 –255, 2009.
Jiang Wu, Meng-Long Li, Le-Zheng Yu, and Chao Wang. An ensemble classifier of support vector machines used to predict protein structural classes by fusing auto covariance and pseudo-amino acid composition. The Protein Journal, 29:62–67, 2010.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag London
About this paper
Cite this paper
Stahl, F., May, D., Bramer, M. (2012). Parallel Random Prism: A Computationally Efficient Ensemble Learner for Classification. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_2
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4739-8_2
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4738-1
Online ISBN: 978-1-4471-4739-8
eBook Packages: Computer ScienceComputer Science (R0)