Leveraging Term Co-occurrence Distance and Strong Classification Features for Short Text Feature Selection

Ma, Huifang; Xing, Yuying; Wang, Shuang; Li, Miao

doi:10.1007/978-3-319-63558-3_6

Huifang Ma¹⁸,
Yuying Xing¹⁸,
Shuang Wang¹⁸ &
…
Miao Li¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10412))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1920 Accesses
5 Citations

Abstract

In this paper, a short text feature selection method based on term co-occurrence distance and strong classification features is presented. On the one hand, co-occurrence distance between terms in each document is considered to determine the co-occurrence distance correlation, based on which the correlation weight for each term can be defined. On the other hand, the improved expected cross entropy is defined to obtain the weight of a term in a particular class with strong class indication. All terms of each class is sorted in a descending order based on their weights and top-k terms are selected as feature terms. Experiments show that our method can improve the effectiveness of short text feature selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ridder, D., Tax, D., Lei, B., et al.: Feature extraction and selection. In: Classification, Parameter Estimation and State Estimation, pp. 259–301. Wiley (2017)
Google Scholar
Deng, Z.H., Luo, K.H., Yu, H.L.: A study of supervised term weighting scheme for sentiment analysis. Expert Syst. Appl. 41(7), 3506–3513 (2014)
Article Google Scholar
Song, S., Zhu, H., Chen, L.: Probabilistic correlation-based similarity measure on text records. Inf. Sci. 289(1), 81–124 (2014)
Google Scholar
Hua, W., Wang, Z., Wang, H., et al.: Short text understanding through lexical-semantic analysis. In: IEEE International Conference on Data Engineering, pp. 495–506. IEEE (2015)
Google Scholar
Kotis, K., Papasalouros, A., Maragoudakis, M.: Mining query logs for learning useful ontologies: an incentive to SW content creation. Inf. J. Knowl. Eng. Data Min. 1(4), 303–330 (2011)
Article Google Scholar
Ma, H., Di, L., Zeng, X., Yan, L., Ma, Y.: Short text feature extension based on improved frequent term sets. In: Shi, Z., Vadera, S., Li, G. (eds.) IIP 2016. IAICT, vol. 486, pp. 169–178. Springer, Cham (2016). doi:10.1007/978-3-319-48390-0_18
Chapter Google Scholar
Wang, L.: An improved method of short text feature extraction based on words co-occurrence. Appl. Mech. Mater. 519–520, 840–843 (2014)
Google Scholar
Ma, H., Zhou, R., Liu, F., Lu, X.: Effectively classifying short texts via improved lexical category and semantic features. In: Huang, D.-S., Bevilacqua, V., Premaratne, P. (eds.) ICIC 2016. LNCS, vol. 9771, pp. 163–174. Springer, Cham (2016). doi:10.1007/978-3-319-42291-6_16
Chapter Google Scholar
Qin, W.L.: jiebaR: Chinese Text Segmentation (2016)
Google Scholar
Gao, L., Zhou, S., Guan, J.: Effectively classifying short texts by structured sparse representation with dictionary filtering. Inf. Sci. 323, 130–142 (2015)
Article MathSciNet Google Scholar
Abdiansah, A., Wardoyo, R.: Time complexity analysis of support vector machines (SVM) in LibSVM. Int. J. Comput. Appl. 128(3), 975–8887 (2015)
Google Scholar

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 61363058), Gansu province college students’ innovation and entrepreneurship training program (201610736041), and the open fund of Key Laboratory of intelligent information processing Institute of computing technology of Chinese Academy of Sciences (IIP2014-4), the Natural Science Foundation of Gansu Province for Distinguished Young Scholars (1308RJDA007).

Author information

Authors and Affiliations

College of Computer Science and Engineering, Northwest Normal University, Lanzhou, China
Huifang Ma, Yuying Xing, Shuang Wang & Miao Li

Authors

Huifang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yuying Xing
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Miao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huifang Ma .

Editor information

Editors and Affiliations

Deakin University, Burwood, Victoria, Australia
Gang Li
University of Arizona, Tucson, Arizona, USA
Yong Ge
Southwest University, Chongqing, China
Zili Zhang
Peking University, Beijing, China
Zhi Jin
University of Technology Sydney, Sydney, New South Wales, Australia
Michael Blumenstein

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, H., Xing, Y., Wang, S., Li, M. (2017). Leveraging Term Co-occurrence Distance and Strong Classification Features for Short Text Feature Selection. In: Li, G., Ge, Y., Zhang, Z., Jin, Z., Blumenstein, M. (eds) Knowledge Science, Engineering and Management. KSEM 2017. Lecture Notes in Computer Science(), vol 10412. Springer, Cham. https://doi.org/10.1007/978-3-319-63558-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-63558-3_6
Published: 19 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63557-6
Online ISBN: 978-3-319-63558-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics