Abstract
Text classification is one of the main issues in the big data analysis and research. In present, however, there is a lack of a universal algorithm model that can fulfill the requirement of both accuracy and efficiency of text classification. This paper proposes a method of text classification, which combines the Naive Bayes and the similarity computing algorithm. Firstly, the text information is cut into several word segmentation vectors by the Paoding Analyzer; then the Bayesian algorithm is employed to conduct the first-level directory classification to the text information; after that, the improved similarity computing algorithm is adopted to carry out the second-level directory classification. Finally, the algorithm model is tested with actual data, and the results are compared with those of Bayesian algorithm and similarity computing algorithm respectively. The results show that the proposed method achieves a higher precision rate.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Steinbach, M., Kumar, V.: Introduction To Data Mining. Pand-Ning Tan Press (2010)
Ju, C., Yin, X., Xu, C.: Bayesian classification algorithm of dynamic data stream based on bootstrap. Comput. Eng. Appl. 47(8), 118–121 (2011)
Mitchell, T.M.: Machine Learning, pp. 112–143. Machine Press, Beijing (2003). (Translated by Zeng, H., Zhang, Y., et al.)
Hao, Z., He, L., Chen, B., Yang, X.: A linear support higher-order tensor machine for classification. IEEE Trans. Image Process. 22(7), 2911–2920 (2013)
Cai, R., Zhang, Z., Hao, Z.: BASSUM: a Bayesian semi-supervised method for classification feature selection. Pattern Recogn. 44(4), 811–820 (2011)
Hao, Z., Cheng, J., Cai, R., Wen, W., Wang, L.: Chinese sentiment classification based on the sentiment drop point. In: Huang, D.-S., Gupta, P., Wang, L., Gromiha, M. (eds.) ICIC 2013. CCIS, vol. 375, pp. 55–60. Springer, Heidelberg (2013)
Hao, Z., He, L., Chen, B., Yang, X.: A linear support higher-order tensor machine for classification. IEEE Trans. Image Process. 22(7), 2911–2920 (2013)
Yufeng, D., Zhenzhen, H., Fei, J., et al.: Study on semantic markup of species description text in chinese based on auto-learning rules. New Technol. Libr. Inf. Serv. 5, 41–47 (2012)
http://www.360doc.com/content/13/0809/13/891660_305827106.shtml
http://www.cnblogs.com/leoo2sk/archive/2010/09/17/naive-bayesian-classifier.html
Acknowledgements
This work was supported by Science and Technology Planning Project of Guangdong Province, China (2015A030401101), (2012B040500034).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hong, Y., Mai, G., Zeng, H., Guo, C. (2015). A Method of Text Classification Combining Naive Bayes and the Similarity Computing Algorithms. In: Cai, R., Chen, K., Hong, L., Yang, X., Zhang, R., Zou, L. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9461. Springer, Cham. https://doi.org/10.1007/978-3-319-28121-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-28121-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28120-9
Online ISBN: 978-3-319-28121-6
eBook Packages: Computer ScienceComputer Science (R0)