Skip to main content

Leveraging Term Co-occurrence Distance and Strong Classification Features for Short Text Feature Selection

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10412))

Abstract

In this paper, a short text feature selection method based on term co-occurrence distance and strong classification features is presented. On the one hand, co-occurrence distance between terms in each document is considered to determine the co-occurrence distance correlation, based on which the correlation weight for each term can be defined. On the other hand, the improved expected cross entropy is defined to obtain the weight of a term in a particular class with strong class indication. All terms of each class is sorted in a descending order based on their weights and top-k terms are selected as feature terms. Experiments show that our method can improve the effectiveness of short text feature selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ridder, D., Tax, D., Lei, B., et al.: Feature extraction and selection. In: Classification, Parameter Estimation and State Estimation, pp. 259–301. Wiley (2017)

    Google Scholar 

  2. Deng, Z.H., Luo, K.H., Yu, H.L.: A study of supervised term weighting scheme for sentiment analysis. Expert Syst. Appl. 41(7), 3506–3513 (2014)

    Article  Google Scholar 

  3. Song, S., Zhu, H., Chen, L.: Probabilistic correlation-based similarity measure on text records. Inf. Sci. 289(1), 81–124 (2014)

    Google Scholar 

  4. Hua, W., Wang, Z., Wang, H., et al.: Short text understanding through lexical-semantic analysis. In: IEEE International Conference on Data Engineering, pp. 495–506. IEEE (2015)

    Google Scholar 

  5. Kotis, K., Papasalouros, A., Maragoudakis, M.: Mining query logs for learning useful ontologies: an incentive to SW content creation. Inf. J. Knowl. Eng. Data Min. 1(4), 303–330 (2011)

    Article  Google Scholar 

  6. Ma, H., Di, L., Zeng, X., Yan, L., Ma, Y.: Short text feature extension based on improved frequent term sets. In: Shi, Z., Vadera, S., Li, G. (eds.) IIP 2016. IAICT, vol. 486, pp. 169–178. Springer, Cham (2016). doi:10.1007/978-3-319-48390-0_18

    Chapter  Google Scholar 

  7. Wang, L.: An improved method of short text feature extraction based on words co-occurrence. Appl. Mech. Mater. 519–520, 840–843 (2014)

    Google Scholar 

  8. Ma, H., Zhou, R., Liu, F., Lu, X.: Effectively classifying short texts via improved lexical category and semantic features. In: Huang, D.-S., Bevilacqua, V., Premaratne, P. (eds.) ICIC 2016. LNCS, vol. 9771, pp. 163–174. Springer, Cham (2016). doi:10.1007/978-3-319-42291-6_16

    Chapter  Google Scholar 

  9. Qin, W.L.: jiebaR: Chinese Text Segmentation (2016)

    Google Scholar 

  10. Gao, L., Zhou, S., Guan, J.: Effectively classifying short texts by structured sparse representation with dictionary filtering. Inf. Sci. 323, 130–142 (2015)

    Article  MathSciNet  Google Scholar 

  11. Abdiansah, A., Wardoyo, R.: Time complexity analysis of support vector machines (SVM) in LibSVM. Int. J. Comput. Appl. 128(3), 975–8887 (2015)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 61363058), Gansu province college students’ innovation and entrepreneurship training program (201610736041), and the open fund of Key Laboratory of intelligent information processing Institute of computing technology of Chinese Academy of Sciences (IIP2014-4), the Natural Science Foundation of Gansu Province for Distinguished Young Scholars (1308RJDA007).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huifang Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ma, H., Xing, Y., Wang, S., Li, M. (2017). Leveraging Term Co-occurrence Distance and Strong Classification Features for Short Text Feature Selection. In: Li, G., Ge, Y., Zhang, Z., Jin, Z., Blumenstein, M. (eds) Knowledge Science, Engineering and Management. KSEM 2017. Lecture Notes in Computer Science(), vol 10412. Springer, Cham. https://doi.org/10.1007/978-3-319-63558-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63558-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63557-6

  • Online ISBN: 978-3-319-63558-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics