Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System
- 1.4k Downloads
In this paper we describe the ICSI-SRI entry in the Rich Transcription 2005 Spring Meeting Recognition Evaluation. The current system is based on the ICSI-SRI clustering system for Broadcast News (BN), with extra modules to process the different meetings tasks in which we participated. Our base system uses agglomerative clustering with a modified Bayesian Information Criterion (BIC) measure to determine when to stop merging clusters and to decide which pairs of clusters to merge. This approach does not require any pre-trained models, thus increasing robustness and simplifying the port from BN to the meetings domain. For the meetings domain, we have added several features to our baseline clustering system, including a “purification” module that tries to keep the clusters acoustically homogeneous throughout the clustering process, and a delay&sum beamforming algorithm which enhances signal quality for the multiple distant microphones (MDM) sub-task. In post-evaluation work we further improved the delay&sum algorithm, experimented with a new speech/non-speech detector and proposed a new system for the lecture room environment.
KeywordsBayesian Information Criterion Cluster System Agglomerative Cluster Conference Room Broadcast News
Unable to display preview. Download preview PDF.
- 1.Ajmera, J., Bourlard, H., Lapidot, I.: Improved unknown-multiple speaker clustering using HMM. IDIAP, Tech. Rep. (2002)Google Scholar
- 2.Ajmera, J., Bourlard, H., Lapidot, I., McCowan, I.: Unknown-multiple speaker clustering using HMM. In: ICSLP 2002, Denver, Colorado, USA (September 2002)Google Scholar
- 3.Ajmera, J., Wooters, C.: A robust speaker clustering algorithm. In: ASRU 2003, US Virgin Islands, USA (December 2003)Google Scholar
- 4.Wooters, C., Fung, J., Peskin, B., Anguera, X.: Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system. In: Rich Transcription Workshop, New Jersey, USA (2004)Google Scholar
- 5.Shaobing Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA (February 1998)Google Scholar
- 7.Brandstein, M.S., Silverman, H.F.: A robust method for speech signal timedelay estimation in reverberant rooms. In: ICASSP 1997, Munich, Germany (1997)Google Scholar
- 9.Li, Q., Tsai, A.: A matched filter approach to endpoint detection for robust speaker verification. In: IEEE Workshop on Automatic Identification Advanced Technologies, New Jersey, USA (October 1999)Google Scholar
- 10.NIST speech tools and APIs, Available at, http://www.nist.gov/speech/tools/index.htm