Advertisement

Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System

  • Xavier Anguera
  • Chuck Wooters
  • Barbara Peskin
  • Mateu Aguiló
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3869)

Abstract

In this paper we describe the ICSI-SRI entry in the Rich Transcription 2005 Spring Meeting Recognition Evaluation. The current system is based on the ICSI-SRI clustering system for Broadcast News (BN), with extra modules to process the different meetings tasks in which we participated. Our base system uses agglomerative clustering with a modified Bayesian Information Criterion (BIC) measure to determine when to stop merging clusters and to decide which pairs of clusters to merge. This approach does not require any pre-trained models, thus increasing robustness and simplifying the port from BN to the meetings domain. For the meetings domain, we have added several features to our baseline clustering system, including a “purification” module that tries to keep the clusters acoustically homogeneous throughout the clustering process, and a delay&sum beamforming algorithm which enhances signal quality for the multiple distant microphones (MDM) sub-task. In post-evaluation work we further improved the delay&sum algorithm, experimented with a new speech/non-speech detector and proposed a new system for the lecture room environment.

Keywords

Bayesian Information Criterion Cluster System Agglomerative Cluster Conference Room Broadcast News 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ajmera, J., Bourlard, H., Lapidot, I.: Improved unknown-multiple speaker clustering using HMM. IDIAP, Tech. Rep. (2002)Google Scholar
  2. 2.
    Ajmera, J., Bourlard, H., Lapidot, I., McCowan, I.: Unknown-multiple speaker clustering using HMM. In: ICSLP 2002, Denver, Colorado, USA (September 2002)Google Scholar
  3. 3.
    Ajmera, J., Wooters, C.: A robust speaker clustering algorithm. In: ASRU 2003, US Virgin Islands, USA (December 2003)Google Scholar
  4. 4.
    Wooters, C., Fung, J., Peskin, B., Anguera, X.: Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system. In: Rich Transcription Workshop, New Jersey, USA (2004)Google Scholar
  5. 5.
    Shaobing Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA (February 1998)Google Scholar
  6. 6.
    Flanagan, J., Johnson, J., Kahn, R., Elko, G.: Computer-steered microphone arrays for sound transduction in large rooms. Journal of the Acoustic Society of America 78, 1508–1518 (November 1994)CrossRefGoogle Scholar
  7. 7.
    Brandstein, M.S., Silverman, H.F.: A robust method for speech signal timedelay estimation in reverberant rooms. In: ICASSP 1997, Munich, Germany (1997)Google Scholar
  8. 8.
    Hirsch, H.-G.: HMM adaptation for applications in telecommunication. Speech Communication 34, 127–139 (2001)CrossRefzbMATHGoogle Scholar
  9. 9.
    Li, Q., Tsai, A.: A matched filter approach to endpoint detection for robust speaker verification. In: IEEE Workshop on Automatic Identification Advanced Technologies, New Jersey, USA (October 1999)Google Scholar
  10. 10.
    NIST speech tools and APIs, Available at, http://www.nist.gov/speech/tools/index.htm

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Xavier Anguera
    • 1
    • 2
  • Chuck Wooters
    • 1
  • Barbara Peskin
    • 1
  • Mateu Aguiló
    • 1
    • 2
  1. 1.International Computer Science InstituteBerkeleyUSA
  2. 2.Technical University of CataloniaBarcelonaSpain

Personalised recommendations