Skip to main content

Statistical Models of Topical Content

  • Chapter
Topic Detection and Tracking

Part of the book series: The Information Retrieval Series ((INRE,volume 12))

Abstract

In this chapter we explore the behavior of two different statistical models, one based on simple unigrams and another based on the beta-binomial distribution, as applied to the problem of modeling story generation. We describe how these models can be incorporated into information extraction applications, particularly Tracking and Detection engines built for the Topic Detection and Tracking evaluations sponsored by DARPA. Tracking systems based on the two models have complementary strengths and weaknesses: a Beta-Binomial system yields high precision at high decision threshold, but performance quickly degrades as the threshold drops; a Unigram system is not as strong at high decision threshold, but is very good at suppressing false-alarms at lower threshold. We will describe the features of these systems that give rise to this behavior, and discuss ways that each system might be improved by borrowing from the other. We will also discuss our Detection system, and how improvements in Tracking should lead to improvements in Detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J.P. Yamron, I. Carp, L. Gillick, S.A. Lowe, and P. van Mulbregt, “Event Tracking and Text Segmentation via Hidden Markov Models,” Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Santa Barbara, December 1997.

    Google Scholar 

  2. J. Allan, J. Carbonell, G. Doddington, J.P. Yamron, and Y. Yang, ‘Topic Detection and Tracking Pilot Study: Final Report,” Proceedings of Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, February 1998.

    Google Scholar 

  3. J.P. Yamron, I. Carp, L. Gillick, S.A. Lowe, and P. van Mulbregt, “A Hidden Markov Model Approach to Text Segmentation and Event Tracking,” Proceedings ICASSP-98, Seattle, May 1998.

    Google Scholar 

  4. P. van Mulbregt, J.P. Yamron, I. Carp, L. Gillick, and S.A. Lowe, ‘Text Segmentation and Topic Tracking on Broadcast News via a Hidden Markov Model Approach.” Proceedings ICSLP-98, Sydney, December 1998.

    Google Scholar 

  5. P. van Mulbregt, I. Carp, L. Gillick, S.A. Lowe, and J.P. Yamron, “Segmentation of Automatically Transcribed Broadcast News Text,” Proceedings of the DARPA Broadcast News Workshop, February 1999.

    Google Scholar 

  6. J.P. Yamron, I. Carp, L. Gillick, S.A. Lowe, and P. van Mulbregt, “Topic Tracking in a News Stream”, Proceedings of the DARPA Broadcast News Workshop, February 1999.

    Google Scholar 

  7. S.A. Lowe, “The Beta-Binomial Mixture Model and Its Application to TDT Tracking and Detection,” Proceedings of the DARPA Broadcast News Workshop, February 1999.

    Google Scholar 

  8. S.A. Lowe, “The Beta-Binomial Mixture Model for Word Frequencies in Documents with Applications to Information Retrieval,” Proceedings of Eurospeech’99, Budapest, September 1999.

    Google Scholar 

  9. J.P. Yamron, L. Gillick, S. Knecht, and P. van Mulbregt, “Statistical Models for Tracking and Detection”, Proceedings of the DARPA Topic Detection and Tracking Workshop, February 2000.

    Google Scholar 

  10. S. Katz, “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer,” IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-35(3): 400–401, March 1987.

    Article  Google Scholar 

  11. H. Ney, U. Essen, and R. Kneser, “On Structuring Probabilistic Dependences in Stochastic Language Modelling,” Computer Speech and Language, 8: 1–38, 1994.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Yamron, J.P., Gillick, L., van Mulbregt, P., Knecht, S. (2002). Statistical Models of Topical Content. In: Allan, J. (eds) Topic Detection and Tracking. The Information Retrieval Series, vol 12. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0933-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0933-2_6

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5311-9

  • Online ISBN: 978-1-4615-0933-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics