Skip to main content

Text Segmentation and Event Detection

  • Chapter
  • First Online:
Machine Learning for Text
  • 9993 Accesses

Abstract

“To improve is to change; to be perfect is to change often.”—Winston Churchill

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    DARPA stands for Defense Advanced Research Projects Agency, which is an agency of the United States Department of Defense. It is responsible for the development of emerging technologies for use by the military, and often funds academic research efforts.

Bibliography

  1. C. Aggarwal and K. Subbian. Event detection in social streams. SDM Conference, 2012.

    Chapter  Google Scholar 

  2. C. Aggarwal and P. Yu. On clustering massive text and categorical data streams. Knowledge and Information Systems, 24(2), pp. 171–196, 2010.

    Article  Google Scholar 

  3. J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study final report. CMU Technical Report, Paper 341, 1998.

    Google Scholar 

  4. H. Becker, M. Naaman, and L. Gravano. Beyond Trending Topics: Real-World Event Identification on Twitter. ICWSM Conference, pp. 438–441, 2011.

    Google Scholar 

  5. D. Beeferman, A. Berger, and J. Lafferty. Statistical models for text segmentation. Machine Learning, 34(1–3), pp. 177–210, 1999.

    Article  Google Scholar 

  6. D. Blei and P. Moreno. Topic segmentation with an aspect hidden Markov model. ACM SIGIR Conference, pp. 343–348, 2001.

    Google Scholar 

  7. N. Chambers, S. Wang, and D. Jurafsky. Classifying temporal relations between events. Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 173–176, 2007.

    Google Scholar 

  8. F. Choi. Advances in domain independent linear text segmentation. North American Chapter of the Association for Computational Linguistics Conference, pp. 26–33, 2000.

    Google Scholar 

  9. F. Choi, P. Wiemer-Hastings, and J. Moore. Latent semantic analysis for text segmentation. EMNLP, 2001.

    Google Scholar 

  10. J. Eisenstein and R. Barzilay. Bayesian unsupervised topic segmentation. Conference on Empirical Methods in Natural Language Processing, pp. 334–343, 2008.

    Google Scholar 

  11. E. Erosheva, S. Fienberg, and J. Lafferty. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences, 101, pp. 5220–5227, 2004.

    Article  Google Scholar 

  12. M. Hearst. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1), pp. 33–64, 1997.

    Google Scholar 

  13. R. Kannan, H. Woo, C. Aggarwal, and H. Park. Outlier detection for text data. SDM Conference, 2017.

    Chapter  Google Scholar 

  14. J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML Conference, pp. 282–289, 2001.

    Google Scholar 

  15. X. Ling and D. Weld. Temporal information extraction. AAAI, pp. 1385–1390, 2010.

    Google Scholar 

  16. D. Litman and R. Passonneau. Combining multiple knowledge sources for discourse segmentation. Association for Computational Linguistics, pp. 108–115, 1995.

    Google Scholar 

  17. I. Mani and G. Wilson. Robust temporal processing of news. ACL Conference, pp. 69–76, 2000.

    Google Scholar 

  18. A. McCallum, D. Freitag, and F. Pereira. Maximum entropy Markov models for information extraction and segmentation. ICML Conference, pp. 591–598, 2000.

    Google Scholar 

  19. D. McClosky, M. Surdeanu, and C. Manning. Event extraction as dependency parsing. Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 1626–1635, 2011.

    Google Scholar 

  20. J. Ponte and W. Croft. Text segmentation by topic. International Conference on Theory and Practice of Digital Libraries, pp. 113–125, 1997.

    Chapter  Google Scholar 

  21. J. Pustejovsky et al. The timebank corpus. Corpus Linguistics, pp. 40, 2003.

    Google Scholar 

  22. J. Pustejovsky et al. TimeML: Robust specification of event and temporal expressions in text. New Directions in Question Answering, 3. pp. 28–34, 2003.

    Google Scholar 

  23. A. Ritter, Mausam, O. Etzioni, and S. Clark. Open domain event extraction from twitter. ACM KDD Conference, pp. 1104–1102, 2012.

    Google Scholar 

  24. A. Ritter, S. Clark, Mausam, and O. Etzioni. Named entity recognition in tweets: an experimental study. Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534, 2011.

    Google Scholar 

  25. T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors. World Wide Web Conference, pp. 851–860, 2010.

    Google Scholar 

  26. G. Salton and J. Allan. Selective text utilization and text traversal. Proceedings of ACM Hypertext, 1993.

    Google Scholar 

  27. G. Salton, J. Allan, and C. Buckley. Approaches to passage retrieval in full text information systems. ACM SIGIR Conference, pp. 49–58, 1997.

    Google Scholar 

  28. G. Salton, A. Singhal, M. Mitra, and C. Buckley. Automatic text structuring and summarization. Information Processing and Management, 33(2), pp. 193–207, 1997.

    Article  Google Scholar 

  29. R. Sauri, R. Knippen, M. Verhagen, and J. Pustejovsky. Evita: a robust event recognizer for QA systems. Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 700–707, 2005.

    Google Scholar 

  30. H. Sayyadi, M. Hurst, and A. Maykov. Event detection and tracking in social streams. ICWSM Conference, 2009.

    Google Scholar 

  31. J. Yamron, I. Carp, L. Gillick, S. Lowe, and P. van Mulbregt. A hidden Markov model approach to text segmentation and event tracking. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 333–336, 1998.

    Google Scholar 

  32. Y. Yang, T. Pierce, and J. Carbonell. A study of retrospective and online event detection. ACM SIGIR Conference, pp. 28–36, 1998.

    Google Scholar 

  33. J. Zhang, Z. Ghahramani, and Y. Yang. A probabilistic model for online document clustering with application to novelty detection. NIPS Conference, pp. 1617–1624, 2004.

    Google Scholar 

  34. http://opennlp.apache.org/index.html

  35. http://nlp.stanford.edu/software/

  36. http://www.nltk.org/

  37. http://mallet.cs.umass.edu/

  38. http://www.nltk.org/api/nltk.tokenize.html#nltk.tokenize.texttiling.TextTilingTokenizer

  39. http://www.itl.nist.gov/iad/mig/tests/tdt/

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Aggarwal, C.C. (2018). Text Segmentation and Event Detection. In: Machine Learning for Text. Springer, Cham. https://doi.org/10.1007/978-3-319-73531-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73531-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73530-6

  • Online ISBN: 978-3-319-73531-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics