Advertisement

Segmentation and Detection at IBM

Hybrid Statistical Models and Two-tiered Clustering
  • S. Dharanipragada
  • M. Franz
  • J. S. McCarley
  • T. Ward
  • W.-J. Zhu
Part of the The Information Retrieval Series book series (INRE, volume 12)

Abstract

IBM’s story segmentation uses a combination of decision tree and maximum entropy models. They take a variety of lexical, prosodic, semantic, and structural features as their inputs. Both types of models are source-specific, and we substantially lower C seg by combining them. IBM’s topic detection system introduces a minimal hierarchy into the clustering: each cluster is comprised of one or more microclusters. We investigate the importance of merging microclusters together, and propose a merging strategy which improves our performance.

Keywords

Content Word Decision Tree Model Broadcast News Automatic Speech Recognition System Maximum Entropy Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    D. Beeferman, A. Berger, and J. Lafferty, “Statistical Models for Text Segmentation”, Machine Learning, vol. 34, pp. 1–34, 1999.CrossRefGoogle Scholar
  2. [2]
    S. Delia Pietra, V. Delia Pietra, J. Lafferty, “Inducing Features of Random Fields”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, p.380 1997.CrossRefGoogle Scholar
  3. [3]
    S. Dharanipragada, M. Franz, J.S. McCarley, S.Roukos, T.Ward, “Story segmentation and Topic detection in the Broadcast News Domain”, Proceedings of the DARPA Broadcast News Workshop, pp. 65–68, 1999.Google Scholar
  4. [4]
    S. Dharanipragada, M. Franz, J.S. McCarley, S. Roukos, T. Ward, “Story Segmentation and Topic Detection for Recognized Speech”, Proceedings of Eurospeech, pp. 2435–2438, Budapest, Hungary, September 1999.Google Scholar
  5. [5]
    S.E. Robertson, S. Walker, S. Jones, M.M. Hancock-Beaulieu, M. Gatford, “Okapi at TREC-3”, Proceedings of the Third Text REtrieval Conference (TREC-3) ed. by D.K. Harman, NIST Special Publication 500–225, 1995.Google Scholar
  6. [6]
    “The Topic Detection and Tracking Phase 3 (TDT-3) Evaluation Plan”, Version 2.7, Aug. 10, 1999, http://www.itl.nist.gov/iaui/894.01/tdt3/tdt3.htm

Copyright information

© Springer Science+Business Media New York 2002

Authors and Affiliations

  • S. Dharanipragada
    • 1
  • M. Franz
    • 1
  • J. S. McCarley
    • 1
  • T. Ward
    • 1
  • W.-J. Zhu
    • 1
  1. 1.IBM T.J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations