Searching through a Speech Memory for Text-Independent Speaker Verification

  • Dijana Petrovska-Delacrétaz
  • Asmaa El Hannani
  • Gérard Chollet
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2688)


Current state-of-the-art speaker verification algorithms use Gaussian Mixture Models (GMM) to estimate the probability density function of the acoustic feature vectors. Previous studies have shown that phonemes have different discriminant power for the speaker verification task. In order to better exploit these differences, it seems reasonable to segment the speech in distinct speech classes and carry out the speaker modeling for each class separately.

Because transcribing databases is a tedious task, we prefer to use datadriven segmentation methods. If the number of automatic classes is comparable to the number of phonetic units, we can make the hypothesis that these units correspond roughly to the phonetic units.We have decided to use the well known Dynamic Time Warping (DTW) method to evaluate the distance between two speech feature vectors. If the two speech segments belong to the same speech class, we could expect that the DTW distortion measure can capture the speaker specific characteristics. The novelty of the proposed method is the combination of the DTW distortion measure with data-driven segmentation tools. The first experimental results of the proposed method, in terms of Detection Error Tradeo. (DET) curves, are comparable to current state-of-the-art speaker verification results, as obtained in NIST speaker recognition evaluations.


Gaussian Mixture Model Dynamic Time Warping Speech Data Speaker Verification Speech Segment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Reynolds, D. A., Quatieri, T. F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing, Special Issue on the NIST’99 evaluations, Vol. 10(1–3), 19–41, January/April/July 2000CrossRefGoogle Scholar
  2. [2]
    Eatock, J.P., Mason, J. S.: A Quantitative Assessment of the Relative Speaker Discriminant Properties of Phonemes. Proc. ICASSP, Vol. 1, 133–136 (1994)Google Scholar
  3. [3]
    Olsen, J.: A Two-stage Procedure for Phone Based Speaker Verification. In G. Borgefors J. Bigün, G. Chollet, editor, First International Conference on Audio and Video Based Biometric Person Authentication (AVBPA), Springer Verlag: Lecture Notes in computer Science 1206. 199–226 (1997)Google Scholar
  4. [4]
    Petrovska-Delacrétaz, D., Černocký, J., Hennebert, J., Chollet, G.: Textindependent Speaker Verification Using Automatically Labeled Acoustic Segments. In International Conference on Spoken Language Processing (ICLSP), Sydney, Australia (1998)Google Scholar
  5. [5]
    Petrovska-Delacrétaz, D., Černocký, J., Chollet, G.: Segmental Approaches for Automatic Speaker Verification. Digital Signal Processing, Special Issue on the NIST’99 evaluations, Vol. 10(1–3), 198–212, January/April/July 2000CrossRefGoogle Scholar
  6. [6]
    Chollet, G., Černocký, J., Constantinescu, A., Deligne, S., Bimbot, F.: Towards ALISP: a proposal for Automatic Language Independent Speech Processing. In Keith Ponting, editor, NATO ASI: Computational models of speech pattern processing Springer Verlag (1999)Google Scholar
  7. [7]
    Rosenberg, A.E.: Automatic Speaker Verification: A Review. Proc. IEEE, Vol. 64, No. 4, April (1976) 475–487Google Scholar
  8. [8]
    Rabiner, L., Schafer, R.W.: Digital Processing of Speech Signals. Prentice Hall, Engewood Cliffs, NJ (1978)Google Scholar
  9. [9]
    Furui, S.: Cepstral Analysis Technique for Automatic Speaker Verification. IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 29, No. 2 (1981) 254–272CrossRefGoogle Scholar
  10. [10]
    Pandit, M., Kittler J.: Feature Selection for a DTW-Based Speaker Verification System. Proc. ICASSP, Seattle, Vol. 2 (1998) 769–772Google Scholar
  11. [11]
    Atal, B.: Efficient coding of LPC Parameters by Temporal Decomposition. Proc. IEEE ICASSP (1983) 81–84Google Scholar
  12. [12]
    Reynolds, D.A.: Comparison of Background Normalization Methods for Text-Independent Speaker Verification. Proc. Eurospeech, Rhodes (1997) 963–966Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Dijana Petrovska-Delacrétaz
    • 1
  • Asmaa El Hannani
    • 1
  • Gérard Chollet
    • 2
  1. 1.Informatics Dept.DIVA Group, University of FribourgSwitzerland
  2. 2.ENSTTSIFrance

Personalised recommendations