Skip to main content

Predicting Hashtag Popularity of Social Emergency by a Robust Feature Extraction Method

  • Conference paper
  • First Online:
Book cover Knowledge and Systems Sciences (KSS 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 780))

Included in the following conference series:

Abstract

Social emergency information is usually disseminated and driven by a hot topic described succinctly with a hashtag in social media. In China, hashtag prediction for social emergencies is more and more practical for E-governance. How to predict the hashtag popularity for social emergency has become a considerably important task. However, previous research mainly focused on commercial hashtag prediction, such as marketing and promotion. For the hashtag popularity prediction, the core issue is to identify the key features for improving prediction accuracy. To the best of our knowledge, there is few research focus on the feature extraction of hashtag for social emergency. In addition, we extract features for hashtag popularity prediction from “seed information” by avoiding excessive crawling. The “seed information” are the microblogs under a hashtag for a 24-h period since the hashtag was published. Based on the “seed information”, the user-based and content-based features are derived, which facilitate the spread of social emergency information. Furthermore, recursive feature elimination (RFE) analysis and nine machine learning classification models are integrated to determine the optimal features among all possible feature combinations. The effectiveness and robustness of our proposed features are verified.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Crane, R., Sornette, D.: Robust dynamic classes revealed by measuring the response function of a social system. Proc. Natl. Acad. Sci. 105(41), 15649–15653 (2008)

    Article  Google Scholar 

  3. Ferrara, E., Interdonato, R., Tagarelli, A.: Online popularity and topical interests through the lens of instagram. In: ACM Conference on Hypertext and Social Media. pp. 24–34 (2014)

    Google Scholar 

  4. Figueiredo, F., Benevenuto, F., Almeida, J.M., Fabr, F.F., Almeida, B.J.M.: The tube over time: characterizing popularity growth of youtube videos. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining. pp. 745–754 (2011)

    Google Scholar 

  5. González-Bailón, S., Borge-Holthoefer, J., Rivero, A., Moreno, Y.: The dynamics of protest recruitment through an online network. Sci. R. 1, 1–7 (2011)

    Article  Google Scholar 

  6. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    Article  MATH  Google Scholar 

  7. He, X., Gao, M., Kan, M.Y., Liu, Y., Sugiyama, K.: Predicting the popularity of web 2.0 items based on user comments. In: The International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 233–242 (2014)

    Google Scholar 

  8. Jeon, J., Croft, W.B., Lee, J.H., Park, S.: A framework to predict the quality of answers with non-textual features. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 228–235 (2006)

    Google Scholar 

  9. Kong, S., Mei, Q., Feng, L., Ye, F., Zhao, Z.: Predicting bursts and popularity of hashtags in real-time. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 927–930 (2014)

    Google Scholar 

  10. Lehmann, J., Gonçalves, B., Cattuto, C., Ramasco, J.J., Cattuto, C.: Dynamical classes of collective attention in Twitter. In: Proceedings of the 21st International Conference on World Wide Web. pp. 251–260. WWW 2012, NY, USA. ACM, New York (2012)

    Google Scholar 

  11. Liu, Q., Agichtein, E., Dror, G., Gabrilovich, E., Maarek, Y., Dan, P., Szpektor, I.: Predicting web searcher satisfaction with existing community-based answers. In: Proceeding of the International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China. pp. 415–424 July 2011

    Google Scholar 

  12. Liu, Y., Huang, X., An, A., Yu, X.: ARSA: a sentiment-aware model for predicting sales performance using blogs. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 607–614 (2007)

    Google Scholar 

  13. Ma, Z., Sun, A., Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter. J. Am. Soc. Inf. Sci. Technol. 64(7), 1399–1410 (2013)

    Article  Google Scholar 

  14. Pinto, H., Almeida, J.M., Gonçalves, M.A.: Using early view patterns to predict the popularity of Youtube videos. In: Proceedings of the 6th ACM International Conference on Web Search and Data Mining, New York, USA. pp. 365–374. ACM, New York February 2013

    Google Scholar 

  15. Tsur, O., Rappoport, A.: What’s in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining. pp. 643–652. ACM (2012)

    Google Scholar 

  16. Wang, S., Yan, Z., Hu, X., Yu, P.S., Li, Z.: Burst time prediction in cascades. In: 29th AAAI Conference on Artificial Intelligence. pp. 325–331 (2015)

    Google Scholar 

  17. Yang, J.S.U., Leskovec, J.S.U.: Patterns of temporal variation in online media categories and subject descriptors. In: ACM International Conference on Web Search and Data Minig. pp. 1–13 (2011)

    Google Scholar 

  18. Zhang, H.P., Yu, H.K., Xiong, D.Y., Liu, Q.: HHMM-based Chinese lexical analyzer ICTCLAS. In: Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. pp. 184–187. Association for Computational Linguistics (2003)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant No. 71403262, 71774154, 71573247, 71503246).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Li, Q., Li, Y. (2017). Predicting Hashtag Popularity of Social Emergency by a Robust Feature Extraction Method. In: Chen, J., Theeramunkong, T., Supnithi, T., Tang, X. (eds) Knowledge and Systems Sciences. KSS 2017. Communications in Computer and Information Science, vol 780. Springer, Singapore. https://doi.org/10.1007/978-981-10-6989-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6989-5_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6988-8

  • Online ISBN: 978-981-10-6989-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics