Advertisement

Improving Accuracy of Short Text Categorization Using Contextual Information

  • V. Vasantha KumarEmail author
  • S. Sendhilkumar
  • G. S. Mahalakshmi
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 713)

Abstract

Categorization plays a major role in information retrieval. The abstracts of research documents have very few terms for the existing categorization algorithms to provide accurate results. This limitation of the abstracts leads to unsatisfactory categorization. This paper proposed a three-stage categorization scheme to improve the accuracy in categorizing the abstracts of research documents. The abstracts on most cases will be extending the context from the surrounding information. Initially, the context from the environment in which the abstract is present is extracted. The proposed system performs context gathering as a continuous process. In the next stage, the short text is subjected to general NLP techniques. The system divides the terms in the abstract into hierarchical levels of context. The terms contributing to the higher levels of context are taken forward to the further stages in categorization. Finally, the system applies weighted terms method to categorize the abstract. In case of uncertainties arising due to the limited number of terms, the context obtained in the initial stage will be used to eliminate the uncertainty. This relation of the context to the content in the short text will provide better accuracy and lead to effective filtering on content in information retrieval. Experiments conducted on categorization of short texts with the proposed method provided better accuracy than traditional feature-based categorization.

Keywords

Context Short text Categorization 

References

  1. 1.
    Alexandrov M., Gelbukh A., Rosso P.: An approach to clustering abstracts. In: Proceedings of the 10th International Conference NLDB-05, volume 3513 of Lecture Notes in Computer Science. Springer, pp. 275–285 (2005)Google Scholar
  2. 2.
    Dey, A.K., Jennifer, M.: Designing mediation for context-aware applications. ACM Trans. Comput. Human Interact. (TOCHI) (2005)Google Scholar
  3. 3.
    Devaraju, A., Hoh, S., Hartley, M.: A context gathering framework for context-aware mobile solutions. In: Mobility ‘07 Proceedings of the 4th International Conference on Mobile Technology, Applications, and Systems and the 1st International Symposium on Computer Human Interaction in Mobile Technology (2007)Google Scholar
  4. 4.
    Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2010)Google Scholar
  5. 5.
    Fiala, D., Rousselot, F., Ježek, K.: PageRank for bibliographic networks. Scientometrics 76(1), 135–158CrossRefGoogle Scholar
  6. 6.
    Fiala, D.: Mining citation information from CiteSeer data. Scientometrics 86(3) (2011)CrossRefGoogle Scholar
  7. 7.
    Pinto, David, Benedí, José-miguel: Paolo Rosso. Clustering Narrow-Domain Short Texts by using the Kullback-Leibler Distance, Computational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science (2007)Google Scholar
  8. 8.
    Pinto, D., Rosso, P.: KnCr: a short-text narrow-domain sub-corpus of med-line. In: Proceedings of TLH-ENC06, pp. 266–269 (2006)Google Scholar
  9. 9.
    Ingaramo, D., Errecalde, M., Rosso, P.: A general bio-inspired method to improve the short-text clustering task. Lecture Notes in Computer Science, 2010, vol. 6008, Computational Linguistics and Intelligent Text Processing, pp. 661–672Google Scholar
  10. 10.
    Metzler, D., Dumais, S., Christopher, M.: Similarity measures for short segments of text. In: Proceedings of the 29th European conference on IR research, ECIR’07 (2007)Google Scholar
  11. 11.
    Perez-Tellez, F., Pinto, D., Cardiff, J., Rosso, P.: On the difficulty of clustering company tweets. In: Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents, October 30, 2010, Toronto, ON, Canada (2010)Google Scholar
  12. 12.
    Castelli, G., Mamei, M., Zambonelli, F.: A Self-organizing approach for building and maintaining knowledge networks. In: Proceedings of Mobile Wireless Middleware, Operating Systems, and Applications. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol. 48, no. 1, Part 4, pp. 175–188Google Scholar
  13. 13.
    Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. KDD 2(2), 1–25 (2008)Google Scholar
  14. 14.
    Wang, J., Zhou, Y., Li, L., Hu, B., Hu, X.: Improving short text clustering performance with keyword expansion. In: The Sixth International Symposium on Neural Networks Advances in Soft Computing (ISNN 2009), vol. 56, pp. 291–298 (2009)Google Scholar
  15. 15.
    Schneider, K.-M.: Techniques for Improving the Performance of Naive Bayes for Text Classification, Computational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science, vol. 3406/2005, pp. 682–693 (2005)CrossRefGoogle Scholar
  16. 16.
    Abdalgader, K., Skabar, A.: Short-text similarity measurement using word sense disambiguation and synonym expansion. Lecture Notes in Computer Science, 2011, vol. 6464, AI 2010: Advances in Artificial Intelligence, pp. 435–444 (2010)Google Scholar
  17. 17.
    Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: WWW’ 10 Proceedings of the 19th International Conference on World Wide Web (2010)Google Scholar
  18. 18.
    Biryukov, M.: Co-author Network Analysis in DBLP: Classifying Personal Names, Modelling, Computation and Optimization in Information Systems and Management Sciences Communications in Computer and Information Science, vol. 14, Part 1, Part 2, pp. 399–408 (2008)zbMATHGoogle Scholar
  19. 19.
    Reuther, P., Walter, B., Ley, M., Weber, A., Klink, S.: Managing the quality of person names in DBLP. In: Research and Advanced Technology for Digital Libraries Lecture Notes in Computer Science, vol. 4172/2006, pp. 508–511 (2006)CrossRefGoogle Scholar
  20. 20.
    Lee, Pei-Chun, Hsin-Ning, Su, Chan, Te-Yi: Assessment of ontology-based knowledge network formation by vector-space model. Scientometrics 85(3), 689–703 (2010)CrossRefGoogle Scholar
  21. 21.
    Zelikovitz, S., Hirsh, H.: Improving short-text classification using unlabeled background knowledge to assess document similarity. In: Proceedings of the Seventeenth International Conference on Machine Learning (2000)Google Scholar
  22. 22.
    Klink, S., Reuther, P., Weber, A., Walter, B., Ley, M.: Analysing Social Networks Within Bibliographical Data, Database and Expert Systems Applications. Lecture Notes in Computer Science, 2006, vol. 4080/2006, pp. 234–243Google Scholar
  23. 23.
    Deepika, J., Mahalakshmi, G.S.: Towards knowledge based impact metrics for open source research publications. Int. J. Internet Distrib. Comput. Syst. (2011)Google Scholar
  24. 24.
    Chen, Q., Yao, L., Yang, J.: Short text classification based on LDA topic model. In: 2016 International Conference on Audio, Language and Image Processing (ICALIP), pp. 749–753. IEEE (2016)Google Scholar
  25. 25.
    Nowak, J., Taspinar, A., Scherer, R.: LSTM recurrent neural networks for short text and sentiment classification. In: International Conference on Artificial Intelligence and Soft Computing, ICAISC 2017: Artificial Intelligence and Soft Computing, vol. 10246, pp. 553–562. LNCS (2017)CrossRefGoogle Scholar
  26. 26.
    Li, P., He, L., Hu, X., Zhang, Y., Li, L., Wu, X.: Concept based short text stream classification with topic drifting detection. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, pp. 1009–1014 (2016).  https://doi.org/10.1109/icdm.2016.0128

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • V. Vasantha Kumar
    • 1
    Email author
  • S. Sendhilkumar
    • 2
  • G. S. Mahalakshmi
    • 3
  1. 1.Department of Computer Science and EngineeringKCG College of TechnologyChennaiIndia
  2. 2.Department of Information Science & TechnologyAnna UniversityChennaiIndia
  3. 3.Department of Computer Science and EngineeringAnna UniversityChennaiIndia

Personalised recommendations