Skip to main content

Mining Event Temporal Boundaries from News Corpora through Evolution Phase Discovery

  • Conference paper
Web-Age Information Management (WAIM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6897))

Included in the following conference series:

Abstract

Currently news flood spreads throughout the web. The techniques of Event Detection and Tracking makes it feasible to gather and structure text information into events which are constructed online automatically and updated temporally. Users are usually eager to browse the whole event evolution. With the huge quantity of documents, it is almost impossible for users to read all of them. In this paper, we formally define the problem of event evolution phases discovery. We introduce a novel and principled model (called EPD), aiming at temporally outlining the entire news development. A news document is usually not atomic but consists of independent news segments related to the same event. Therefore we first employ a latent ingredients extraction method to extract event snippets. Unlike traditional clustering methods, we propose a novel metrics integrating content feature, temporal feature, distribution feature and bursty feature to measure the correlation between snippets along timeline in a specific event. Combined with bursty feature, we introduce a novel method to compute word weight. We employ HAC to group the news snippets into diversified phases. An optimization problem are utilized to decide the number of phases, which makes EPD applied. With our novel evaluation method, empirical experiments on two real datasets show that EPD is effective and outperforms various related algorithms. Automatic event chronicle generated is introduced as a typical application of EPD.

Supported by NSFC with Grant No. 61073081, National Key Technology R&D Pillar Program in the 11th Five-year Plan of China with Research No. 2009BAH47B05, and ZTE University Partnership Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study: Final report, Tech. Rep. (1998)

    Google Scholar 

  2. Yang, Y., Pierce, T., Carbonell, J.: A study of retrospective and on-line event detection. In: SIGIR 1998, pp. 28–36 (1998)

    Google Scholar 

  3. Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR 1998, pp. 37–45 (1998)

    Google Scholar 

  4. Brants, T., Chen, F., Farahat, A.: A System for new event detection. In: SIGIR 2003, pp. 330–337 (2003)

    Google Scholar 

  5. Franz, M., Ward, T., McCarley, J.S., Zhu, W.: Unsupervised and supervised clustering for topic tracking. In: SIGIR 2001, pp. 310–317 (2001)

    Google Scholar 

  6. Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic-conditioned novelty detection. In: KDD 2002, pp. 688–693 (2002)

    Google Scholar 

  7. Liu, S., Merhav, Y., Yee, W.G., Goharian, N., Frieder, O.: A sentence level probabilistic model for evolutionary theme pattern mining from news corpora. In: SAC 2009, pp. 1742–1747 (2009)

    Google Scholar 

  8. Nallapati, R., Feng, A., Peng, F., Allan, J.: Event threading within news topics. In: CIKM 2004, pp. 446–453 (2004)

    Google Scholar 

  9. Feng, A., Allan, J.: Finding and linking incidents in news. In: CIKM 2007, pp. 821–830 (2007)

    Google Scholar 

  10. Mei, Q., Zhai, C.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: SIGKDD 2005, pp. 198–207 (2005)

    Google Scholar 

  11. Yang, C., Shi, X.: Discovering event evolution graphs from newswires. In: WWW 2006, pp. 945–946 (2006)

    Google Scholar 

  12. Yan, R., Li, Y., Zhang, Y., Li, X.M.: Event Recognition from News Webpages through Latent Ingredients Extraction. In: Cheng, P.-J., Kan, M.-Y., Lam, W., Nakov, P. (eds.) AIRS 2010. LNCS, vol. 6458, pp. 490–501. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  13. Wei, C.-P., Lee, Y.-H., Chiang, Y.-S., Chen, J.-D., Yang, C.C.C.: Discovering event episodes from news corpora: a temporal-based approach. In: ICEC 2009, pp. 72–80 (2009)

    Google Scholar 

  14. Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: SIGIR 2008, pp. 299–306 (2008)

    Google Scholar 

  15. Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: SIGKDD 2009, pp. 497–506 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kong, L., Yan, R., Jiang, H., Zhang, Y., Gao, Y., Fu, L. (2011). Mining Event Temporal Boundaries from News Corpora through Evolution Phase Discovery. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23535-1_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23534-4

  • Online ISBN: 978-3-642-23535-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics