Segmentation and Detection at IBM
IBM’s story segmentation uses a combination of decision tree and maximum entropy models. They take a variety of lexical, prosodic, semantic, and structural features as their inputs. Both types of models are source-specific, and we substantially lower C seg by combining them. IBM’s topic detection system introduces a minimal hierarchy into the clustering: each cluster is comprised of one or more microclusters. We investigate the importance of merging microclusters together, and propose a merging strategy which improves our performance.
KeywordsContent Word Decision Tree Model Broadcast News Automatic Speech Recognition System Maximum Entropy Model
Unable to display preview. Download preview PDF.
- S. Dharanipragada, M. Franz, J.S. McCarley, S.Roukos, T.Ward, “Story segmentation and Topic detection in the Broadcast News Domain”, Proceedings of the DARPA Broadcast News Workshop, pp. 65–68, 1999.Google Scholar
- S. Dharanipragada, M. Franz, J.S. McCarley, S. Roukos, T. Ward, “Story Segmentation and Topic Detection for Recognized Speech”, Proceedings of Eurospeech, pp. 2435–2438, Budapest, Hungary, September 1999.Google Scholar
- S.E. Robertson, S. Walker, S. Jones, M.M. Hancock-Beaulieu, M. Gatford, “Okapi at TREC-3”, Proceedings of the Third Text REtrieval Conference (TREC-3) ed. by D.K. Harman, NIST Special Publication 500–225, 1995.Google Scholar
- “The Topic Detection and Tracking Phase 3 (TDT-3) Evaluation Plan”, Version 2.7, Aug. 10, 1999, http://www.itl.nist.gov/iaui/894.01/tdt3/tdt3.htm