Abstract
In this chapter we explore the behavior of two different statistical models, one based on simple unigrams and another based on the beta-binomial distribution, as applied to the problem of modeling story generation. We describe how these models can be incorporated into information extraction applications, particularly Tracking and Detection engines built for the Topic Detection and Tracking evaluations sponsored by DARPA. Tracking systems based on the two models have complementary strengths and weaknesses: a Beta-Binomial system yields high precision at high decision threshold, but performance quickly degrades as the threshold drops; a Unigram system is not as strong at high decision threshold, but is very good at suppressing false-alarms at lower threshold. We will describe the features of these systems that give rise to this behavior, and discuss ways that each system might be improved by borrowing from the other. We will also discuss our Detection system, and how improvements in Tracking should lead to improvements in Detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J.P. Yamron, I. Carp, L. Gillick, S.A. Lowe, and P. van Mulbregt, “Event Tracking and Text Segmentation via Hidden Markov Models,” Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Santa Barbara, December 1997.
J. Allan, J. Carbonell, G. Doddington, J.P. Yamron, and Y. Yang, ‘Topic Detection and Tracking Pilot Study: Final Report,” Proceedings of Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, February 1998.
J.P. Yamron, I. Carp, L. Gillick, S.A. Lowe, and P. van Mulbregt, “A Hidden Markov Model Approach to Text Segmentation and Event Tracking,” Proceedings ICASSP-98, Seattle, May 1998.
P. van Mulbregt, J.P. Yamron, I. Carp, L. Gillick, and S.A. Lowe, ‘Text Segmentation and Topic Tracking on Broadcast News via a Hidden Markov Model Approach.” Proceedings ICSLP-98, Sydney, December 1998.
P. van Mulbregt, I. Carp, L. Gillick, S.A. Lowe, and J.P. Yamron, “Segmentation of Automatically Transcribed Broadcast News Text,” Proceedings of the DARPA Broadcast News Workshop, February 1999.
J.P. Yamron, I. Carp, L. Gillick, S.A. Lowe, and P. van Mulbregt, “Topic Tracking in a News Stream”, Proceedings of the DARPA Broadcast News Workshop, February 1999.
S.A. Lowe, “The Beta-Binomial Mixture Model and Its Application to TDT Tracking and Detection,” Proceedings of the DARPA Broadcast News Workshop, February 1999.
S.A. Lowe, “The Beta-Binomial Mixture Model for Word Frequencies in Documents with Applications to Information Retrieval,” Proceedings of Eurospeech’99, Budapest, September 1999.
J.P. Yamron, L. Gillick, S. Knecht, and P. van Mulbregt, “Statistical Models for Tracking and Detection”, Proceedings of the DARPA Topic Detection and Tracking Workshop, February 2000.
S. Katz, “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer,” IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-35(3): 400–401, March 1987.
H. Ney, U. Essen, and R. Kneser, “On Structuring Probabilistic Dependences in Stochastic Language Modelling,” Computer Speech and Language, 8: 1–38, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media New York
About this chapter
Cite this chapter
Yamron, J.P., Gillick, L., van Mulbregt, P., Knecht, S. (2002). Statistical Models of Topical Content. In: Allan, J. (eds) Topic Detection and Tracking. The Information Retrieval Series, vol 12. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0933-2_6
Download citation
DOI: https://doi.org/10.1007/978-1-4615-0933-2_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5311-9
Online ISBN: 978-1-4615-0933-2
eBook Packages: Springer Book Archive