Advertisement

Social-Correlation Based Mutual Reinforcement for Short Text Classification and User Interest Tagging

  • Rong Li
  • Ya Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8346)

Abstract

Short text such as micro-blog messages is becoming increasingly prevalent in China. Due to the sparseness of the features associated with short text, accurately classifying short text and tagging user interest have become important and challenging tasks. Many recent studies have focused on utilizing external data to address the data sparsity issue but fail to leverage the social-correlation which is expected to help improve the accuracy of short text classification. In this paper, we present a new method using a semi-supervised coupled mutual reinforcement framework based on social-correlation to simultaneously classify short text and tag user interest. Specifically, our method requires relatively few labeled examples to initialize the training process. More importantly, experimental results have demonstrated that our method can achieve 100% accuracy in classifying certain categories and significantly improve the accuracy of classifying the other categories. Meanwhile, the experiments show that our model is effective in user interest tagging.

Keywords

mutual reinforcement user interest tagging short text classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    China Internet Development Statistics Report, 第32次中国互联网络发展状况统计报告, http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201301/P020130724346275579709.pdf
  2. 2.
    Long, G., Chen, L., Zhu, X.Q., Zhang, C.Q.: TCSST: Transfer Classification of Short & Sparse Text Using External Data. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 764–772. ACM Press, New York (2012)Google Scholar
  3. 3.
    Pan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-Scale Data Collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100. ACM Press, Beijing (2008)Google Scholar
  4. 4.
    Dai, Z., Sun, A., Liu, X.-Y.: Crest: Cluster-based Representation Enrichment for Short Text Classification. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part II. LNCS, vol. 7819, pp. 256–267. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short Text Classification in Twitter to Improve Information Filtering. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 841–842. ACM Press, New York (2010)Google Scholar
  6. 6.
    Hatzivassiloglou, V., Klavans, J.L., Eskin, E.: Detecting Text Similarity over Short Passage: Exploring Linguistic Feature Combinations via Machine Learning. In: Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 203–212. Maryland (1999)Google Scholar
  7. 7.
    Li, Y.H., Mclean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE Transactions on Knowledge and Data Engineering 18, 1138–1150 (2006)CrossRefGoogle Scholar
  8. 8.
    Jiang, J.J., Conrath, D.W.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: Proceedings of ROCLING X, Taiwan (1997)Google Scholar
  9. 9.
    Lyon, C., Malcolm, J., Dickerson, B.: Detecting Short Passages of Similar Text in Large Document. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, pp. 118–128. Pennsylvania (2001)Google Scholar
  10. 10.
    Rafeeque, P.C., Sendhikumar, S.: A Survey on Short Text Analysis in Web. In: 2011 Third International Conference on Advanced Computing, Chennai, pp. 365–371 (2011)Google Scholar
  11. 11.
    Meng, W., Lanfen, L., Jing, W., Penghua, Y., Jiaolong, L., Fei, X.: Improving Short Text Classification Using Public Search Engines. In: Qin, Z., Huynh, V.-N. (eds.) IUKM 2013. LNCS, vol. 8032, pp. 157–166. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  12. 12.
    Francisco, P.R., Pascual, J.-I., Andres, S., Mateus, F.S., Juan, G.-C.: Classifying Unlabeled Short Texts Using a Fuzzy Declarative Approach. Language Resources and Evaluation 47, 151–178 (2013)CrossRefGoogle Scholar
  13. 13.
    Sarah, Z., Haym, H.: Improving Short Text Classification Using Unlabeled Background Knowledge to Assess Document Similarity. In: Proceedings of the Seventeenth International Conference on Machine Learning, San Francisco, pp. 1183–1190 (2000)Google Scholar
  14. 14.
    Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM Press, New York (1998)CrossRefGoogle Scholar
  15. 15.
    Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley (1973)Google Scholar
  16. 16.
    Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 189–196. Pennsylvania (1995)Google Scholar
  17. 17.
    Bian, J., Liu, Y.D., Zhou, D., Agichtein, E., Zha, H.Y.: Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement. In: Proceedings of the 18th International Conference on World Wide Web, p. 5 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Rong Li
    • 1
  • Ya Zhang
    • 1
  1. 1.Shanghai Key Laboratory of Multimedia Processing and TransmissionsShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations