Skip to main content

Large Scale Text Categorization Based on Density Statistics Merging

  • Conference paper
  • First Online:
Advances in Intelligent, Interactive Systems and Applications (IISA 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 885))

  • 1268 Accesses

Abstract

In view of the problems faced by the effective classification and management of massive text, a new classification method for mass Web text information is proposed. The core idea is based on the characteristics of the low quantity, high value rate of the long text and the high quantity and low price rate of the short text in the current network environment. The feature selection method based on complex network is proposed. The number of features obtained by this method is more stable, and the accuracy of the selection of features in large text centralization is improved. Secondly, a text classification method based on density statistical merging is proposed, and the classification method is studied from the point of view of data sampling. The method is classified. In the process, we not only use the density information of the text feature set, but also use the difference information of each feature of the text obtained by the statistical merging criteria. Therefore, the algorithm has better robustness to noise and has a better classification effect to the large text set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang, Y.Z., Jin, X.L., Cheng, X.Q.: Network big data: present and future. Chin. J. Comput. 36(6), 1125–1138 (2013)

    Article  Google Scholar 

  2. Zhao, Y., Fan, Z.A., Zhu, Q.: Conceptualization and research progress on user-generated content. J. Libr. Sci. China 5, 008 (2012)

    Google Scholar 

  3. Cancho, R.F.I., Solé, R.V.: The small world of human language. Proc. Biol. Sci. 268(1482), 2261–2265 (2001)

    Article  Google Scholar 

  4. Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 493–502 (2004)

    Article  Google Scholar 

  5. Liu, B.B., Ru-Ning, M.A., Ding, J.D.: Density-based statistical merging algorithm for large data sets. J. Softw. 26, 2820–2835 (2015)

    MathSciNet  MATH  Google Scholar 

  6. Vijaya, P.A., Murty, M.N., Subramanian, D.K.: Leaders-Subleaders: An Efficient Hierarchical Clustering Algorithm for Large Data Sets. Elsevier, Amsterdam (2004)

    Google Scholar 

  7. Romero, E.: Using the leader algorithm with support vector machines for large data sets. In: Artificial Neural Networks and Machine Learning—ICANN, vol. 6791, pp. 225–232 (2011)

    Google Scholar 

  8. Viswanath, P., Babu, V.S.: Rough-DBSCAN: a fast hybrid density based clustering method for large data sets. Pattern Recogn. Lett. 30(16), 1477–1488 (2009)

    Article  Google Scholar 

  9. Nock, R., Nielsen, F.: Statistical region merging. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1452 (2004)

    Article  Google Scholar 

  10. Xu, L., Fu, Y., Li, S.: Web text classifier based on an improved SVM decision tree. J. Soochow Univ. 5, 003 (2011)

    Google Scholar 

  11. Zhang, X.F., Huang, H.Y.: An improved KNN text categorization algorithm by adopting cluster technology. Pattern Recogn. Artif. Intell. 22(6), 936–940 (2009)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by The Education Department of Jilin province science and technology research project “13th Five-Year” Kyrgyzstan UNESCO Zi [2016] No. 159th.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rujuan Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, R., Wang, S. (2019). Large Scale Text Categorization Based on Density Statistics Merging. In: Xhafa, F., Patnaik, S., Tavana, M. (eds) Advances in Intelligent, Interactive Systems and Applications. IISA 2018. Advances in Intelligent Systems and Computing, vol 885. Springer, Cham. https://doi.org/10.1007/978-3-030-02804-6_43

Download citation

Publish with us

Policies and ethics