Text Categorization with Diversity Random Forests

Yang, Chun; Yin, Xu-Cheng; Huang, Kaizhu

doi:10.1007/978-3-319-12643-2_39

Text Categorization with Diversity Random Forests

Chun Yang²⁰,
Xu-Cheng Yin²⁰ &
Kaizhu Huang²¹

Conference paper

4438 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8836))

Abstract

Text categorization (TC), has many typical traits, such as large and difficult category taxonomies, noise and incremental data, etc. Random Forests, one of the most important but simple state-of-the-art ensemble methods, has been used to solve such type of subjects with good performance. most current Random Forests approaches with diversity-related issues focus on maximizing tree diversity while producing and training component trees. There are much diverse characteristics for component trees in TC trained on data of noise, huge categories and features. Consequently, given numerous component trees from the original Random Forests, we propose a novel method, Diversity Random Forests, which diversely and adaptively select and combine tree classifiers with diversity learning and sample weighting. Diversity Random Forests includes two key issues. First, by designing a matrix for the data distribution creatively, we formulate a unified optimization model for learning and selecting diverse trees, where tree weights are learned through a convex quadratic programming problem with given sample weights. Second, we propose a new self-training algorithm to iteratively run the convex optimization and automatically learn the sample weights. Extensive experiments on a variety of text categorization benchmark data sets show that the proposed approach consistently outperforms state-of-the-art methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Manning, C.D., Prabhakar, R., Hinrich, S.: Introduction to Information Retrieval. Cambridge University Press (2008)
Google Scholar
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Article MATH Google Scholar
Liu, F.T., Ting, K.M., Fan, W.: Maximizing tree diversity by building complete-random decision trees. In: Proceeding of PAKDD, pp. 605–610 (2005)
Google Scholar
Liu, F.T., Ting, K.M., Yu, Y., Zhou, Z.H.: Spectrum of variable-random trees. J. Artif. Intell. Res. 32, 355–384 (2008)
MATH Google Scholar
Zhou, Z.H., Wu, J., Tang, W.: Ensembling neural networks: Many could be better than all. Artificial Intelligence 137, 239–263 (2002)
Article MathSciNet MATH Google Scholar
Yin, X.-C., Huang, K., Hao, H.-W., Iqbal, K., Wang, Z.-B.: Classifier ensemble using a heuristic learning with sparsity and diversity. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012, Part II. LNCS, vol. 7664, pp. 100–107. Springer, Heidelberg (2012)
Chapter Google Scholar
Yin, X.C., Huang, K., Hao, H.W., Iqbal, K., Wang, Z.B.: A novel classifier ensemble method with sparsity and diversity. Neurocomputing 134, 214–221 (2014)
Article Google Scholar
Yin, X.C., Huang, K., Yang, C., Hao, H.W.: Convex ensemble learning with sparsity and diversity. Information Fusion 20, 49–59 (2014)
Article Google Scholar
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proceedings of ICML, pp. 148–156 (1996)
Google Scholar
Biau, G.: Analysis of a random forests model. J. Mach. Learn. Res. 13, 1063–1095 (2012)
MathSciNet MATH Google Scholar
Genuer, R., Poggi, J.M., Tuleau-Malot, C.: Variable selection using random forests. Pattern Recognition Letters 31(14), 2225–2236 (2010)
Article Google Scholar
Verikas, A., Gelzinis, A., Bacauskiene, M.: Mining data with random forests: A survey and results of new tests. Pattern Recognition 44(2), 330–349 (2011)
Article Google Scholar
Skalak, D.B.: The sources of increased accuracy for two proposed boosting algorithms. In: Proceeding of AAAI, pp. 120–125 (1996)
Google Scholar
Han, E.H., Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Proceedings of European PKDD, pp. 424–431 (2000)
Google Scholar
Chang, C.C., Lin, C.J.: Libsvm: A library for support vector machines. ACM Trans. Intelligent Systems and Technology 2(3), 1–27 (2011), http://www.csie.ntu.edu.tw/cjlin/libsvm
Article Google Scholar
Brazdil, P., Soares, C.: A comparison of ranking methods for classification algorithm selection. In: Proceedings of ECML, pp. 63–74 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
Chun Yang & Xu-Cheng Yin
Department of Electrical and Electronic Engineering, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
Kaizhu Huang

Authors

Chun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xu-Cheng Yin
View author publications
You can also search for this author in PubMed Google Scholar
Kaizhu Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Artificial Intelligence, Faculty of Computer Science and Information Technology Building, University of Malaya, 50603, Kuala Lumpur, Malaysia
Chu Kiong Loo
Department of Electronics and Communication Engineering, College of Engineering, Jalan IKRAM-UNITEN, Universiti Tenaga Nasional, 43009, Kajang, Selangor, Malaysia
Keem Siah Yap
School of Engineering and Information Technology, Murdoch University, 6150, South St, Murdoch, Western Australia, Australia
Kok Wai Wong
Department of Electrical and Electronics Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, 120-749, Seoul, South Korea
Andrew Teoh Beng Jin
Department of Electrical and Electronic Engineering, Xi’an Jiaotong-Liverpool University, Ren’ai Road 111, SIP 215123, Suzhou, Jiangsu Province, China
Kaizhu Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, C., Yin, XC., Huang, K. (2014). Text Categorization with Diversity Random Forests. In: Loo, C.K., Yap, K.S., Wong, K.W., Beng Jin, A.T., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8836. Springer, Cham. https://doi.org/10.1007/978-3-319-12643-2_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-12643-2_39
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12642-5
Online ISBN: 978-3-319-12643-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics