Abstract
In this paper, a short text feature selection method based on term co-occurrence distance and strong classification features is presented. On the one hand, co-occurrence distance between terms in each document is considered to determine the co-occurrence distance correlation, based on which the correlation weight for each term can be defined. On the other hand, the improved expected cross entropy is defined to obtain the weight of a term in a particular class with strong class indication. All terms of each class is sorted in a descending order based on their weights and top-k terms are selected as feature terms. Experiments show that our method can improve the effectiveness of short text feature selection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ridder, D., Tax, D., Lei, B., et al.: Feature extraction and selection. In: Classification, Parameter Estimation and State Estimation, pp. 259–301. Wiley (2017)
Deng, Z.H., Luo, K.H., Yu, H.L.: A study of supervised term weighting scheme for sentiment analysis. Expert Syst. Appl. 41(7), 3506–3513 (2014)
Song, S., Zhu, H., Chen, L.: Probabilistic correlation-based similarity measure on text records. Inf. Sci. 289(1), 81–124 (2014)
Hua, W., Wang, Z., Wang, H., et al.: Short text understanding through lexical-semantic analysis. In: IEEE International Conference on Data Engineering, pp. 495–506. IEEE (2015)
Kotis, K., Papasalouros, A., Maragoudakis, M.: Mining query logs for learning useful ontologies: an incentive to SW content creation. Inf. J. Knowl. Eng. Data Min. 1(4), 303–330 (2011)
Ma, H., Di, L., Zeng, X., Yan, L., Ma, Y.: Short text feature extension based on improved frequent term sets. In: Shi, Z., Vadera, S., Li, G. (eds.) IIP 2016. IAICT, vol. 486, pp. 169–178. Springer, Cham (2016). doi:10.1007/978-3-319-48390-0_18
Wang, L.: An improved method of short text feature extraction based on words co-occurrence. Appl. Mech. Mater. 519–520, 840–843 (2014)
Ma, H., Zhou, R., Liu, F., Lu, X.: Effectively classifying short texts via improved lexical category and semantic features. In: Huang, D.-S., Bevilacqua, V., Premaratne, P. (eds.) ICIC 2016. LNCS, vol. 9771, pp. 163–174. Springer, Cham (2016). doi:10.1007/978-3-319-42291-6_16
Qin, W.L.: jiebaR: Chinese Text Segmentation (2016)
Gao, L., Zhou, S., Guan, J.: Effectively classifying short texts by structured sparse representation with dictionary filtering. Inf. Sci. 323, 130–142 (2015)
Abdiansah, A., Wardoyo, R.: Time complexity analysis of support vector machines (SVM) in LibSVM. Int. J. Comput. Appl. 128(3), 975–8887 (2015)
Acknowledgement
This work is supported by the National Natural Science Foundation of China (No. 61363058), Gansu province college students’ innovation and entrepreneurship training program (201610736041), and the open fund of Key Laboratory of intelligent information processing Institute of computing technology of Chinese Academy of Sciences (IIP2014-4), the Natural Science Foundation of Gansu Province for Distinguished Young Scholars (1308RJDA007).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ma, H., Xing, Y., Wang, S., Li, M. (2017). Leveraging Term Co-occurrence Distance and Strong Classification Features for Short Text Feature Selection. In: Li, G., Ge, Y., Zhang, Z., Jin, Z., Blumenstein, M. (eds) Knowledge Science, Engineering and Management. KSEM 2017. Lecture Notes in Computer Science(), vol 10412. Springer, Cham. https://doi.org/10.1007/978-3-319-63558-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-63558-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63557-6
Online ISBN: 978-3-319-63558-3
eBook Packages: Computer ScienceComputer Science (R0)