Multimedia Tools and Applications

, Volume 77, Issue 3, pp 3171–3187 | Cite as

Task-oriented keyphrase extraction from social media

  • Min Yang
  • Yuzhi Liang
  • Wei Zhao
  • Wei Xu
  • Jia Zhu
  • Qiang Qu
Article
  • 82 Downloads

Abstract

Keyphrase extraction from social media is a crucial and challenging task. Previous studies usually focus on extracting keyphrases that provide the summary of a corpus. However, they do not take users’ specific needs into consideration. In this paper, we propose a novel three-stage model to learn a keyphrase set that represents or related to a particular topic. Firstly, a phrase mining algorithm is applied to segment the documents into human-interpretable phrases. Secondly, we propose a weakly supervised model to extract candidate keyphrases, which uses a few pre-specific seed keyphrases to guide the model. The model consequently makes the extracted keyphrases more specific and related to the seed keyphrases (which reflect the user’s needs). Finally, to further identify the implicitly related phrases, the PMI-IR algorithm is employed to obtain the synonyms of the extracted candidate keyphrases. We conducted experiments on two publicly available datasets from news and Twitter. The experimental results demonstrate that our approach outperforms the state-of-the-art baselines and has the potential to extract high-quality task-oriented keyphrases.

Keywords

Keyphrase extraction Weakly supervised learning Topic model 

References

  1. 1.
    Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference of very large data bases, VLDB, vol 1215, pp 487–499Google Scholar
  2. 2.
    Arora R, Ravindran B (2008) Latent dirichlet allocation based multi-document summarization. In: Proceedings of the second workshop on analytics for noisy unstructured text data. ACM, pp 91–97Google Scholar
  3. 3.
    Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATHGoogle Scholar
  4. 4.
    Chang X, Nie F, Wang S, Yi Y, Zhou X, Zhang C (2016) Compound rank- k projections for bilinear analysis. IEEE Transactions on Neural Networks and Learning Systems 27(7):1502–1513MathSciNetCrossRefGoogle Scholar
  5. 5.
    Chang X, Yu Y-L, Yi Y, Xing EP (2016) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell. doi: 10.1109/TPAMI.2016.2608901
  6. 6.
    Chang X, Yi Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst. doi: 10.1109/TNNLS.2016.2582746
  7. 7.
    Chang X, Ma Z, Lin M, Yi Y, Hauptmann A (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920MathSciNetCrossRefGoogle Scholar
  8. 8.
    Chang X, Ma Z, Yi Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197CrossRefGoogle Scholar
  9. 9.
    Chen J, Zhang B, Shen D, Yang Q, Chen Z, Cheng Q (2006) Diverse topic phrase extraction from text collectionGoogle Scholar
  10. 10.
    Chien L-F (1997) Pat-tree-based keyword extraction for chinese information retrieval. In: ACM SIGIR forum, vol 31. ACM, pp 50–58Google Scholar
  11. 11.
    Choi Y, Cardie C (2009) Adapting a polarity lexicon using integer linear programming for domain-specific sentiment classification. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 2. Association for Computational Linguistics, pp 590–598Google Scholar
  12. 12.
    El-Kishky A, Song Y, Wang C, Voss CR, Han J (2014) Scalable topical phrase mining from text corpora. Proceedings of the VLDB Endowment 8(3):305–316CrossRefGoogle Scholar
  13. 13.
    Feng X, Huang L, Tang D, Qin B, Ji H, Liu T (2016) A language-independent neural network for event detection. In: The 54th annual meeting of the association for computational linguistics, p 66Google Scholar
  14. 14.
    Firth JR (1957) A synopsis of linguistic theory, 1930-1955Google Scholar
  15. 15.
    Frank E, Paynter GW, Witten IH, Gutwin C, Nevill-Manning CG (1999) Domain-specific keyphrase extractionGoogle Scholar
  16. 16.
    Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 19–25Google Scholar
  17. 17.
    Lafferty J, McCallum A, Pereira F et al (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, vol 1, pp 282–289Google Scholar
  18. 18.
    Li J, Fan Q, Zhang K (2007) Keyword extraction based on tf/idf for chinese news document. Wuhan Univ J Nat Sci 12(5):917–921. doi: 10.1007/s11859-007-0038-4 CrossRefGoogle Scholar
  19. 19.
    Lott B (2012) Survey of keyword extraction techniques. UNM EducationGoogle Scholar
  20. 20.
    Ma Z, Chang X, Yi Y, Sebe N, Hauptmann A (2017) The many shades of negativity. IEEE Trans Multimedia 19(7):1558–1568CrossRefGoogle Scholar
  21. 21.
    Neto JL, Santos AD, Kaestner CAA, Alexandre N, Santos D et al (2000) Document clustering and text summarizationGoogle Scholar
  22. 22.
    Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523CrossRefGoogle Scholar
  23. 23.
    Shamma DA, Kennedy L, Churchill EF (2009) Tweet the debates: understanding community annotation of uncollected sources. In: Proceedings of the first SIGMM workshop on social media. ACM, pp 3–10Google Scholar
  24. 24.
    Tu W, Cheung DW-L, Mamoulis N, Yang M, Lu Z (2015) Real-time detection and sorting of news on microblogging platforms. In: PACLICGoogle Scholar
  25. 25.
    Turney P (2001) Mining the web for synonyms: Pmi-ir versus lsa on toeflGoogle Scholar
  26. 26.
    Turney PD (2000) Learning algorithms for keyphrase extraction. Inf Retr 2 (4):303–336CrossRefGoogle Scholar
  27. 27.
    Turney PD (2002) Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 417–424Google Scholar
  28. 28.
    Yang M, Chow K-P (2015) An information extraction framework for digital forensic investigations. In: IFIP international conference on digital forensics. Springer, Cham, pp 61–76Google Scholar
  29. 29.
    Yang M, Peng B, Chen Z, Zhu D, Chow K-P (2014) A topic model for building fine-grained domain-specific emotion lexicon. pp 421–426. ACLGoogle Scholar
  30. 30.
    Yang M, Zhu D, Rashed M, Chow K-P (2014) Learning domain-specific sentiment lexicon with supervised sentiment-aware lda. In: The 21st European conference on artificial intelligence (ECAI). IOS PressGoogle Scholar
  31. 31.
    Yang M, Cui T, Tu W (2015) Ordering-sensitive and semantic-aware topic modeling. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 2353–2359Google Scholar
  32. 32.
    Zhang C (2008) Automatic keyword extraction from documents using conditional random fields. J Comput Inf Syst 4(3):1169–1180Google Scholar
  33. 33.
    Zhu L, Shen J, Xie L, Cheng Z (2016) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybern. doi: 10.1109/TCYB.2016.2591068
  34. 34.
    Zhu J, Xie Q, Yu S-I, Wong WH (2016) Exploiting link structure for web page genre identification. Data Min Knowl Disc 30(3):550–575MathSciNetCrossRefGoogle Scholar
  35. 35.
    Zhu J, Xu C, Li Z, Fung G, Lin X, Huang J, Huang C (2016) An examination of on-line machine learning approaches for pseudo-random generated data. Clust Comput 19(3):1309–1321CrossRefGoogle Scholar
  36. 36.
    Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.School of Computing ScienceSouth China Normal UniversityGuangzhouChina
  2. 2.Shenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenChina
  3. 3.Department of Computer ScienceThe University of Hong KongPok Fu LamHong Kong
  4. 4.TencentShenzhenChina

Personalised recommendations