Large Scale Text Categorization Based on Density Statistics Merging

Wang, Rujuan; Wang, Suhua

doi:10.1007/978-3-030-02804-6_43

Rujuan Wang¹⁷ &
Suhua Wang¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 885))

Included in the following conference series:

International Conference on Intelligent and Interactive Systems and Applications

1268 Accesses

Abstract

In view of the problems faced by the effective classification and management of massive text, a new classification method for mass Web text information is proposed. The core idea is based on the characteristics of the low quantity, high value rate of the long text and the high quantity and low price rate of the short text in the current network environment. The feature selection method based on complex network is proposed. The number of features obtained by this method is more stable, and the accuracy of the selection of features in large text centralization is improved. Secondly, a text classification method based on density statistical merging is proposed, and the classification method is studied from the point of view of data sampling. The method is classified. In the process, we not only use the density information of the text feature set, but also use the difference information of each feature of the text obtained by the statistical merging criteria. Therefore, the algorithm has better robustness to noise and has a better classification effect to the large text set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wang, Y.Z., Jin, X.L., Cheng, X.Q.: Network big data: present and future. Chin. J. Comput. 36(6), 1125–1138 (2013)
Article Google Scholar
Zhao, Y., Fan, Z.A., Zhu, Q.: Conceptualization and research progress on user-generated content. J. Libr. Sci. China 5, 008 (2012)
Google Scholar
Cancho, R.F.I., Solé, R.V.: The small world of human language. Proc. Biol. Sci. 268(1482), 2261–2265 (2001)
Article Google Scholar
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 493–502 (2004)
Article Google Scholar
Liu, B.B., Ru-Ning, M.A., Ding, J.D.: Density-based statistical merging algorithm for large data sets. J. Softw. 26, 2820–2835 (2015)
MathSciNet MATH Google Scholar
Vijaya, P.A., Murty, M.N., Subramanian, D.K.: Leaders-Subleaders: An Efficient Hierarchical Clustering Algorithm for Large Data Sets. Elsevier, Amsterdam (2004)
Google Scholar
Romero, E.: Using the leader algorithm with support vector machines for large data sets. In: Artificial Neural Networks and Machine Learning—ICANN, vol. 6791, pp. 225–232 (2011)
Google Scholar
Viswanath, P., Babu, V.S.: Rough-DBSCAN: a fast hybrid density based clustering method for large data sets. Pattern Recogn. Lett. 30(16), 1477–1488 (2009)
Article Google Scholar
Nock, R., Nielsen, F.: Statistical region merging. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1452 (2004)
Article Google Scholar
Xu, L., Fu, Y., Li, S.: Web text classifier based on an improved SVM decision tree. J. Soochow Univ. 5, 003 (2011)
Google Scholar
Zhang, X.F., Huang, H.Y.: An improved KNN text categorization algorithm by adopting cluster technology. Pattern Recogn. Artif. Intell. 22(6), 936–940 (2009)
Google Scholar

Download references

Acknowledgments

This work was partially supported by The Education Department of Jilin province science and technology research project “13th Five-Year” Kyrgyzstan UNESCO Zi [2016] No. 159th.

Author information

Authors and Affiliations

College of Humanities and Sciences of Northeast Normal University, Changchun, 130012, China
Rujuan Wang & Suhua Wang

Authors

Rujuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Suhua Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rujuan Wang .

Editor information

Editors and Affiliations

Department de Ciències de la Computació, Universitat Politècnica de Catalunya, Barcelona, Spain
Fatos Xhafa
Department of Computer Science and Engineering, Faculty of Engineering and Technology, SOA University, Bhubaneswar, Odisha, India
Srikanta Patnaik
Department of Business Systems and Analytics, La Salle University, Philadelphia, PA, USA
Madjid Tavana

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, R., Wang, S. (2019). Large Scale Text Categorization Based on Density Statistics Merging. In: Xhafa, F., Patnaik, S., Tavana, M. (eds) Advances in Intelligent, Interactive Systems and Applications. IISA 2018. Advances in Intelligent Systems and Computing, vol 885. Springer, Cham. https://doi.org/10.1007/978-3-030-02804-6_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-02804-6_43
Published: 17 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02803-9
Online ISBN: 978-3-030-02804-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics