Segmentation and Detection at IBM

Hybrid Statistical Models and Two-tiered Clustering
  • S. Dharanipragada
  • M. Franz
  • J. S. McCarley
  • T. Ward
  • W.-J. Zhu
Part of the The Information Retrieval Series book series (INRE, volume 12)

Abstract

IBM’s story segmentation uses a combination of decision tree and maximum entropy models. They take a variety of lexical, prosodic, semantic, and structural features as their inputs. Both types of models are source-specific, and we substantially lower C seg by combining them. IBM’s topic detection system introduces a minimal hierarchy into the clustering: each cluster is comprised of one or more microclusters. We investigate the importance of merging microclusters together, and propose a merging strategy which improves our performance.

Keywords

Entropy Posit Extractor Harman 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    D. Beeferman, A. Berger, and J. Lafferty, “Statistical Models for Text Segmentation”, Machine Learning, vol. 34, pp. 1–34, 1999.CrossRefGoogle Scholar
  2. [2]
    S. Delia Pietra, V. Delia Pietra, J. Lafferty, “Inducing Features of Random Fields”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, p.380 1997.CrossRefGoogle Scholar
  3. [3]
    S. Dharanipragada, M. Franz, J.S. McCarley, S.Roukos, T.Ward, “Story segmentation and Topic detection in the Broadcast News Domain”, Proceedings of the DARPA Broadcast News Workshop, pp. 65–68, 1999.Google Scholar
  4. [4]
    S. Dharanipragada, M. Franz, J.S. McCarley, S. Roukos, T. Ward, “Story Segmentation and Topic Detection for Recognized Speech”, Proceedings of Eurospeech, pp. 2435–2438, Budapest, Hungary, September 1999.Google Scholar
  5. [5]
    S.E. Robertson, S. Walker, S. Jones, M.M. Hancock-Beaulieu, M. Gatford, “Okapi at TREC-3”, Proceedings of the Third Text REtrieval Conference (TREC-3) ed. by D.K. Harman, NIST Special Publication 500–225, 1995.Google Scholar
  6. [6]
    “The Topic Detection and Tracking Phase 3 (TDT-3) Evaluation Plan”, Version 2.7, Aug. 10, 1999, http://www.itl.nist.gov/iaui/894.01/tdt3/tdt3.htm

Copyright information

© Springer Science+Business Media New York 2002

Authors and Affiliations

  • S. Dharanipragada
    • 1
  • M. Franz
    • 1
  • J. S. McCarley
    • 1
  • T. Ward
    • 1
  • W.-J. Zhu
    • 1
  1. 1.IBM T.J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations