Statistical Models of Topical Content

Yamron, J. P.; Gillick, L.; van Mulbregt, P.; Knecht, S.

doi:10.1007/978-1-4615-0933-2_6

J. P. Yamron³,
L. Gillick³,
P. van Mulbregt³ &
…
S. Knecht⁴

Part of the book series: The Information Retrieval Series ((INRE,volume 12))

729 Accesses
3 Citations

Abstract

In this chapter we explore the behavior of two different statistical models, one based on simple unigrams and another based on the beta-binomial distribution, as applied to the problem of modeling story generation. We describe how these models can be incorporated into information extraction applications, particularly Tracking and Detection engines built for the Topic Detection and Tracking evaluations sponsored by DARPA. Tracking systems based on the two models have complementary strengths and weaknesses: a Beta-Binomial system yields high precision at high decision threshold, but performance quickly degrades as the threshold drops; a Unigram system is not as strong at high decision threshold, but is very good at suppressing false-alarms at lower threshold. We will describe the features of these systems that give rise to this behavior, and discuss ways that each system might be improved by borrowing from the other. We will also discuss our Detection system, and how improvements in Tracking should lead to improvements in Detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J.P. Yamron, I. Carp, L. Gillick, S.A. Lowe, and P. van Mulbregt, “Event Tracking and Text Segmentation via Hidden Markov Models,” Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Santa Barbara, December 1997.
Google Scholar
J. Allan, J. Carbonell, G. Doddington, J.P. Yamron, and Y. Yang, ‘Topic Detection and Tracking Pilot Study: Final Report,” Proceedings of Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, February 1998.
Google Scholar
J.P. Yamron, I. Carp, L. Gillick, S.A. Lowe, and P. van Mulbregt, “A Hidden Markov Model Approach to Text Segmentation and Event Tracking,” Proceedings ICASSP-98, Seattle, May 1998.
Google Scholar
P. van Mulbregt, J.P. Yamron, I. Carp, L. Gillick, and S.A. Lowe, ‘Text Segmentation and Topic Tracking on Broadcast News via a Hidden Markov Model Approach.” Proceedings ICSLP-98, Sydney, December 1998.
Google Scholar
P. van Mulbregt, I. Carp, L. Gillick, S.A. Lowe, and J.P. Yamron, “Segmentation of Automatically Transcribed Broadcast News Text,” Proceedings of the DARPA Broadcast News Workshop, February 1999.
Google Scholar
J.P. Yamron, I. Carp, L. Gillick, S.A. Lowe, and P. van Mulbregt, “Topic Tracking in a News Stream”, Proceedings of the DARPA Broadcast News Workshop, February 1999.
Google Scholar
S.A. Lowe, “The Beta-Binomial Mixture Model and Its Application to TDT Tracking and Detection,” Proceedings of the DARPA Broadcast News Workshop, February 1999.
Google Scholar
S.A. Lowe, “The Beta-Binomial Mixture Model for Word Frequencies in Documents with Applications to Information Retrieval,” Proceedings of Eurospeech’99, Budapest, September 1999.
Google Scholar
J.P. Yamron, L. Gillick, S. Knecht, and P. van Mulbregt, “Statistical Models for Tracking and Detection”, Proceedings of the DARPA Topic Detection and Tracking Workshop, February 2000.
Google Scholar
S. Katz, “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer,” IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-35(3): 400–401, March 1987.
Article Google Scholar
H. Ney, U. Essen, and R. Kneser, “On Structuring Probabilistic Dependences in Stochastic Language Modelling,” Computer Speech and Language, 8: 1–38, 1994.
Article Google Scholar

Download references

Author information

Authors and Affiliations

formerly of Dragon Systems/Lernout & Hauspie, 320 Nevada Street, 02460, Newton, MA, USA
J. P. Yamron, L. Gillick & P. van Mulbregt
Dragon Systems/Lernout & Hauspie, 320 Nevada Street, 02460, Newton, MA, USA
S. Knecht

Authors

J. P. Yamron
View author publications
You can also search for this author in PubMed Google Scholar
L. Gillick
View author publications
You can also search for this author in PubMed Google Scholar
P. van Mulbregt
View author publications
You can also search for this author in PubMed Google Scholar
S. Knecht
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Massachusetts at Amherst, USA
James Allan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yamron, J.P., Gillick, L., van Mulbregt, P., Knecht, S. (2002). Statistical Models of Topical Content. In: Allan, J. (eds) Topic Detection and Tracking. The Information Retrieval Series, vol 12. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0933-2_6

Download citation

DOI: https://doi.org/10.1007/978-1-4615-0933-2_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5311-9
Online ISBN: 978-1-4615-0933-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics