Skip to main content

A Relevant Content Filtering Based Framework for Data Stream Summarization

  • Conference paper
  • First Online:
Social Informatics (SocInfo 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10047))

Included in the following conference series:

  • 2414 Accesses

Abstract

Social media platforms are a rich source of information these days, however, of all the available information, only a small fraction is of users’ interest. To help users catch up with the latest topics of their interests from the large amount of information available in social media, we present a relevant content filtering based framework for data stream summarization. More specifically, given the topic or event of interest, this framework can dynamically discover and filter out relevant information from irrelevant information in the stream of text provided by social media platforms. It then captures the most representative and up-to-date information to generate a sequential summary or event story line along with the evolution of the topic or event. This framework does not depend on any labeled data, it instead uses the weak supervision provided by the user, which matches the real scenarios of users searching for information about an ongoing event. The experiments on two real events traced by Twitter verified the effectiveness of the proposed framework. The robustness of using the most easy-to-obtain weak supervision, i.e., trending topic or hashtag indicates that the framework can be easily integrated into social media platforms such as Twitter to generate sequential summaries for the events of interest. We also make the manually generated gold-standard sequential summaries of the two test events publicly available (https://drive.google.com/open?id=15jRw13i0xARUW3HqBn3BdR45IXk7P2Qj-HO__OFmMW0) for future use in the community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It is not explicitly labeled for the classification task, rather than obtained from the data itself.

  2. 2.

    http://trec.nist.gov/data/tweets/.

  3. 3.

    http://www.ark.cs.cmu.edu/TweetNLP/.

  4. 4.

    http://en.wikipedia.org/wiki/Domodedovo_International_Airport_bombing.

  5. 5.

    http://en.wikipedia.org/wiki/Timeline_of_the_Egyptian_Revolution_of_2011.

  6. 6.

    http://www.aljazeera.com/news/middleeast/2011/01/201112515334871490.html.

  7. 7.

    (https://drive.google.com/open?id=15jRw13i0xARUW3HqBn3BdR45IXk7P2Qj-HO__OFmMW0).

References

  1. Chakrabarti, D., Punera, K.: Event summarization using tweets. In: ICWSM (2011)

    Google Scholar 

  2. Chang, Y., Wang, X., Mei, Q., Liu, Y.: Towards Twitter context summarization with user influence models. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 527–536. ACM (2013)

    Google Scholar 

  3. Dong, C., Agarwal, A.: WS\({}^{\text{2}}\)F: a weakly supervised framework for data stream filtering. In: 2014 IEEE International Conference on Big Data, Big Data 2014, Washington, DC, USA, October 27–30, pp. 50–57 (2014)

    Google Scholar 

  4. Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22(1), 457–479 (2004)

    Google Scholar 

  5. Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.A.: Part-of-speech tagging for Twitter: annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 42–47 (2011)

    Google Scholar 

  6. Ounis, I., Craig Macdonald, J.L., Soboroff, I.: Overview of the TREC-2011 microblog track. In: Proceedings of the 20th Text REtrieval Conference (TREC 2011) (2011)

    Google Scholar 

  7. Kelly, R.: Twitter study - august 2009 (August 2009). http://pearanalytics.com/wp-content/uploads/2012/12/Twitter-Study-August-2009.pdf

  8. Khan, M.A.H., Iwai, M., Sezaki, K.: An improved classification strategy for filtering relevant tweets using bag-of-word classifiers. J. Inf. Process. 21(3), 507–516 (2013)

    Google Scholar 

  9. Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-2004 Workshop, pp. 74–81 (2004)

    Google Scholar 

  10. Long, R., Wang, H., Chen, Y., Jin, O., Yu, Y.: Towards effective event detection, tracking and summarization on microblog data. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) WAIM 2011. LNCS, vol. 6897, pp. 652–663. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Marcus, A., Bernstein, M.S., Badar, O., Karger, D.R., Madden, S., Miller, R.C.: Twitinfo: aggregating and visualizing microblogs for event exploration. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 227–236. ACM (2011)

    Google Scholar 

  12. Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. Association for Computational Linguistics (2004)

    Google Scholar 

  13. Nichols, J., Mahmud, J., Drews, C.: Summarizing sporting events using Twitter. In: Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces, pp. 189–198. ACM (2012)

    Google Scholar 

  14. Olariu, A.: Hierarchical clustering in improving microblog stream summarization. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7817, pp. 424–435. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  15. Olariu, A.: Efficient online summarization of microblogging streams. In: EACL 2014, p. 236 (2014)

    Google Scholar 

  16. Osborne, M., Moran, S., McCreadie, R., Von Lunen, A., Sykora, M., Cano, E., Ireson, N., Macdonald, C., Ounis, I., He, Y., et al.: Real-time detection, tracking, and monitoring of automatically discovered events in social media. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 37–42 (2014)

    Google Scholar 

  17. Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manage. 40(6), 919–938 (2004)

    Article  MATH  Google Scholar 

  18. Sharifi, B., Hutton, M.A., Kalita, J.K.: Experiments in microblog summarization. In: 2010 IEEE Second International Conference on Social Computing, pp. 49–56 (2010)

    Google Scholar 

  19. Sharifi, B., Hutton, M.A., Kalita, J.: Summarizing microblogs automatically. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 685–688 (2010)

    Google Scholar 

  20. Yang, X., Ghoting, A., Ruan, Y., Parthasarathy, S.: A framework for summarizing and analyzing twitter feeds. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 370–378. ACM (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cailing Dong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Dong, C., Agarwal, A. (2016). A Relevant Content Filtering Based Framework for Data Stream Summarization. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10047. Springer, Cham. https://doi.org/10.1007/978-3-319-47874-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47874-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47873-9

  • Online ISBN: 978-3-319-47874-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics