Abstract
The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7× speedup on an Intel dual-quad core machine.
Chapter PDF
References
Aldinucci, M., Meneghin, M., Torquati, M.: Efficient Smith-Waterman on multi-core with FastFlow. In: Proc. of the Euromicro Conf. on Parallel, Distributed and Network-based Processing (PDP), pp. 195–199. IEEE, Pisa (2010)
Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. CACM 52(10), 56–67 (2009)
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing 37(1), 55–69 (1996)
Buehrer, G.T.: Scalable mining on emerging architectures. Phd thesis, Columbus, OH, USA (2008)
Cole, M.: Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Computing 30(3), 389–406 (2004)
Coppola, M., Vanneschi, M.: High-performance data mining with skeleton-based structured parallel programming. Parallel Computing 28(5), 793–813 (2002)
Gehrke, J.E., Ramakrishnan, R., Ganti, V.: RainForest — A framework for fast decision tree construction of large datasets. Data Mining and Knowledge Discovery 4(2/4), 127–162 (2000)
Ghoting, A., Buehrer, G., Parthasarathy, S., Kim, D., Nguyen, A., Chen, Y.K., Dubey, P.: Cache-conscious frequent pattern mining on a modern processor. In: Proc. of the Intl. Conf. on Very Large Data Bases (VLDB), pp. 577–588 (2005)
Han, E., Srivastava, A., Kumar, V.: Parallel formulation of inductive classification parallel algorithm. Tech. rep., Department Computer and Information Science, University of Minnesota (1996)
Jin, R., Yang, G., Agrawal, G.: Shared memory parallelization of data mining algorithms: Techniques, programming interface, and performance. IEEE Transactions on Knowledge and Data Engineering 17, 71–89 (2005)
Joshi, M., Karypis, G., Kumar, V.: ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets. In: Proc. of IPPS/SPDP, pp. 573–579. IEEE, Los Alamitos (1998)
Lim, T., Loh, W., Shih, Y.: A comparison of prediction accuracy, complexity, and training time of thirthy-tree old and new classification algorithms. Machine Learning Journal 40, 203–228 (2000)
Park, I., Voss, M.J., Kim, S.W., Eigenmann, R.: Parallel programming environment for OpenMP. Scientific Programming 9, 143–161 (2001)
Pisharath, J., Zambreno, J., Ozisikyilmaz, B., Choudhary, A.: Accelerating data mining workloads: Current approaches and future challenges in system architecture design. In: Proc. of Workshop on High Performance and Distributed Mining (2006)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Ruggieri, S.: Efficient C4.5. IEEE Transactions on Knowledge and Data Engineering 14, 438–444 (2002)
Ruggieri, S.: YaDT: Yet another Decision tree Builder. In: 16th IEEE Int. Conf. on Tools with Artificial Intelligence (ICTAI), pp. 260–265. IEEE, Los Alamitos (2004)
Shafer, J.C., Agrawal, R., Mehta, M.: SPRINT: A scalable parallel classifier for data mining. In: Proc. of the Intl. Conf. on Very Large Data Bases (VLDB), pp. 544–555 (1996)
Sodan, A.C., Machina, J., Deshmeh, A., Macnaughton, K., Esbaugh, B.: Parallelism via multithreaded and multicore CPUs. IEEE Computer 43(3), 24–32 (2010)
Sreenivas, M.K., Alsabti, K., Ranka, S.: Parallel out-of-core divide-and-conquer techniques with application to classification trees. In: Proc. of IPPS/SPDP, pp. 555–562. IEEE, Los Alamitos (1999)
Thies, W., Karczmarek, M., Amarasinghe, S.P.: StreamIt: A language for streaming applications. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 179–196. Springer, Heidelberg (2002)
Vanneschi, M.: The programming model of ASSIST, an environment for parallel and distributed portable applications. Parallel Computing 28(12), 1709–1732 (2002)
Zaki, M., Ho, C.T., Agrawal, R.: Parallel classification for data mining on shared-memory multiprocessors. In: Proc. of the Intl. Conf. on Data Engineering (ICDE), pp. 198–205. IEEE, Los Alamitos (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aldinucci, M., Ruggieri, S., Torquati, M. (2010). Porting Decision Tree Algorithms to Multicore Using FastFlow. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6321. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15880-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-15880-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15879-7
Online ISBN: 978-3-642-15880-3
eBook Packages: Computer ScienceComputer Science (R0)