Abstract
Ordinal decision tree (ODT) can effectively deal with monotonic classification problems. However, it is difficult for the existing ordinal decision tree algorithms to learning ODT from large data sets. In order to deal with the problem of generating an ODT from large datasets, this paper presents a parallel processing mechanism in the framework of MapReduce. Similar to the general ordinal decision tree algorithms, the rank mutual information (RMI) is still used to select the extended attributes. Differing from the calculation of RMI in the previous algorithms, this paper applies a strategy of attribute parallelization to calculate the RMI. Experiments on large ordered data sets (which are generated artificially) confirm that our proposed algorithm is feasible. Experimental results show that our algorithm is effective and efficient from three aspects: speed-up, scale-up and size-up.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Potharst, R., Bioch, J.C.: Decision trees for ordinal classification. Intelligent Data Analysis 4(2), 97–111 (2000)
Hu, Q.H., Guo, M.Z., Yu, D.R., et al.: Information entropy for ordinal classification. Science China Information Sciences 53(6), 1188–1200 (2010)
Kufrin, R.: Decision trees on parallel processors. Machine Intelligence and Pattern Recognition 20, 279–306 (1999)
Olcay, T.Y., Onur, D.: Parallel univariate decision trees. Pattern Recognition Letters 28, 825–832 (2007)
Wu, G., Li, H., Hu, X., et al.: MReC4.5: C4.5 ensemble classification with MapReduce. The Fourth ChinaGrid Annual Conference, 249–255 (2009)
He, Q., Dong, Z., Zhuang, F., Shang, T., Shi, Z.: Parallel Decision Tree with Application to Water Quality Data Analysis. In: Wang, J., Yen, G.G., Polycarpou, M.M. (eds.) ISNN 2012, Part II. LNCS, vol. 7368, pp. 628–637. Springer, Heidelberg (2012)
Yin, W., Simmhan, Y., Prasanna, V.K.: Scalable regression tree learning on Hadoop using OpenPlanet. Proceedings of third international workshop on MapReduce and its Applications. Date, 57–64 (2012)
Zhu, M., Shen, D., Yu, G., et al.: Computing the Split Points for Learning Decision Tree in MapReduce. Database Systems for Advanced Applications, Lecture Notes in Computer Science 7826, 339–353 (2013)
Sara, R., Victoria, L., Jos, M., et al.: On the use of MapReduce for imbalanced big data using Random Forest. Information Sciences 2014.03.043 (2014)
Potharst, R., Bioch, J.C.: Decision trees for ordinal classification. Intelligent Data Analysis 4(2), 97–111 (2000)
Xia, F., Zhang, W., Li, F., et al.: Ranking with decision tree. Knowledge and information systems 17(3), 381–395 (2008)
Hu, Q.H., Guo, M.Z., Yu, D.R., et al.: Information entropy for ordinal classification. Science China Information Sciences 53(6), 1188–1200 (2010)
Hu, Q., Che, X., Zhang, L., et al.: Rank Entropy-Based Decision Trees for Monotonic Classification. IEEE Transactions on Knowledge and Data Engineering 24(11), 2052–2064 (2012)
Jeffrey, D., Sanjay, G.: MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51(1), pp. 107–113 (January 2008)
He, Q., Shang, T.: Parallel extreme learning machine for regression based on MapReduce. Neurocomputing 102, 52–58 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, S., Zhai, J., Zhu, H., Wang, X. (2014). Parallel Ordinal Decision Tree Algorithm and Its Implementation in Framework of MapReduce. In: Wang, X., Pedrycz, W., Chan, P., He, Q. (eds) Machine Learning and Cybernetics. ICMLC 2014. Communications in Computer and Information Science, vol 481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45652-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-662-45652-1_25
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45651-4
Online ISBN: 978-3-662-45652-1
eBook Packages: Computer ScienceComputer Science (R0)