A Method of Text Classification Combining Naive Bayes and the Similarity Computing Algorithms

Hong, Yinghan; Mai, Guizhen; Zeng, Hui; Guo, Cai

doi:10.1007/978-3-319-28121-6_1

A Method of Text Classification Combining Naive Bayes and the Similarity Computing Algorithms

Yinghan Hong¹⁹,
Guizhen Mai²⁰,
Hui Zeng¹⁹ &
…
Cai Guo¹⁹

Conference paper
First Online: 30 December 2015

682 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9461))

Abstract

Text classification is one of the main issues in the big data analysis and research. In present, however, there is a lack of a universal algorithm model that can fulfill the requirement of both accuracy and efficiency of text classification. This paper proposes a method of text classification, which combines the Naive Bayes and the similarity computing algorithm. Firstly, the text information is cut into several word segmentation vectors by the Paoding Analyzer; then the Bayesian algorithm is employed to conduct the first-level directory classification to the text information; after that, the improved similarity computing algorithm is adopted to carry out the second-level directory classification. Finally, the algorithm model is tested with actual data, and the results are compared with those of Bayesian algorithm and similarity computing algorithm respectively. The results show that the proposed method achieves a higher precision rate.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Steinbach, M., Kumar, V.: Introduction To Data Mining. Pand-Ning Tan Press (2010)
Google Scholar
Ju, C., Yin, X., Xu, C.: Bayesian classification algorithm of dynamic data stream based on bootstrap. Comput. Eng. Appl. 47(8), 118–121 (2011)
Google Scholar
Mitchell, T.M.: Machine Learning, pp. 112–143. Machine Press, Beijing (2003). (Translated by Zeng, H., Zhang, Y., et al.)
Google Scholar
Hao, Z., He, L., Chen, B., Yang, X.: A linear support higher-order tensor machine for classification. IEEE Trans. Image Process. 22(7), 2911–2920 (2013)
Article Google Scholar
Cai, R., Zhang, Z., Hao, Z.: BASSUM: a Bayesian semi-supervised method for classification feature selection. Pattern Recogn. 44(4), 811–820 (2011)
Article MATH Google Scholar
Hao, Z., Cheng, J., Cai, R., Wen, W., Wang, L.: Chinese sentiment classification based on the sentiment drop point. In: Huang, D.-S., Gupta, P., Wang, L., Gromiha, M. (eds.) ICIC 2013. CCIS, vol. 375, pp. 55–60. Springer, Heidelberg (2013)
Chapter Google Scholar
Hao, Z., He, L., Chen, B., Yang, X.: A linear support higher-order tensor machine for classification. IEEE Trans. Image Process. 22(7), 2911–2920 (2013)
Article Google Scholar
Yufeng, D., Zhenzhen, H., Fei, J., et al.: Study on semantic markup of species description text in chinese based on auto-learning rules. New Technol. Libr. Inf. Serv. 5, 41–47 (2012)
Google Scholar
http://www.360doc.com/content/13/0809/13/891660_305827106.shtml
http://www.cnblogs.com/leoo2sk/archive/2010/09/17/naive-bayesian-classifier.html

Download references

Acknowledgements

This work was supported by Science and Technology Planning Project of Guangdong Province, China (2015A030401101), (2012B040500034).

Author information

Authors and Affiliations

Hanshan Normal University, Chaozhou, 521041, Gugangdong, China
Yinghan Hong, Hui Zeng & Cai Guo
Guangdong University of Technology, Guangzhou, 510006, Gugangdong, China
Guizhen Mai

Authors

Yinghan Hong
View author publications
You can also search for this author in PubMed Google Scholar
Guizhen Mai
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Cai Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yinghan Hong .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, Guangdong, China
Ruichu Cai
Research Institute of China Telecom Co., Guangzhou, China
Kang Chen
Wuhan University, Wuhan, China
Liang Hong
Advanced Digital Sciences Center, Singapore, Singapore
Xiaoyan Yang
East China Normal University, Shanghai, China
Rong Zhang
Peking University, Beijing, China
Lei Zou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hong, Y., Mai, G., Zeng, H., Guo, C. (2015). A Method of Text Classification Combining Naive Bayes and the Similarity Computing Algorithms. In: Cai, R., Chen, K., Hong, L., Yang, X., Zhang, R., Zou, L. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9461. Springer, Cham. https://doi.org/10.1007/978-3-319-28121-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-28121-6_1
Published: 30 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28120-9
Online ISBN: 978-3-319-28121-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics