Skip to main content

Durations of Context-Dependent Phonemes: A New Feature in Speaker Verification

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4441))

Abstract

We present a text-dependent speaker verification system based on Hidden Markov Models. A set of features, based on the temporal duration of context-dependent phonemes, is used in order to distinguish amongst speakers. Our approach was tested using the YOHO corpus; it was found that the HMM-based system achieved an equal error rate (EER) of 0.68% using conventional (acoustic) features and an EER of 0.32% when the time features were combined with the acoustic features. This compares well with state-of-the-art results on the same test, and shows the value of the temporal features for speaker verification. These features may also be useful for other purposes, such as the detection of replay attacks, or for improving the robustness of speaker-verification systems to channel or speaker variations. Our results confirm earlier findings obtained on text-independent speaker recognition [1] and text-dependent speaker verification [2] tasks, and contain a number of suggestions on further possible improvements.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ferrer, L., Bratt, H., Gadde, V.R.R., Kajarekar, S., Shriberg, E., Sönmez, K., Stolcke, A., Venkataraman, A.: Modeling duration patterns for speaker recognition. In: Proceedings of Eurospeech, pp. 784–787 (September 2003)

    Google Scholar 

  2. van Heerden, C.J., Barnard, E.: Using timing information in speaker verification. In: Proceedings of the Symposium of the Pattern Recognition Association of South Africa, pp. 53–57 (December 2005)

    Google Scholar 

  3. Campbell, J.P.: Speaker recognition: A tutorial. Proceedings of the IEEE 85, 1437–1462 (1997)

    Article  Google Scholar 

  4. Martin, A.: Evaluations of Automatic Speaker Classification Systems. In: Müller, C. (ed.) Speaker Classification. Lecture Notes in Computer Science / Artificial Intelligence, vol. 4343, Springer, Heidelberg (2007) (this issue)

    Chapter  Google Scholar 

  5. Koreman, J., Wu, D., Morris, A.C.: Enhancing Speaker Discrimination at the Feature Level. In: Müller, C. (ed.) Speaker Classification. Lecture Notes in Computer Science / Artificial Intelligence, vol. 4343, Springer, Heidelberg (2007) (this issue)

    Chapter  Google Scholar 

  6. Campbell, J.: Testing with the YOHO CD-ROM voice verification corpus. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 341–344 (May 1995)

    Google Scholar 

  7. Campbell, J.P., Reynolds, D.A.: Corpora for the evaluation of speaker recognition systems. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 829–832 (March 1999)

    Google Scholar 

  8. Higgens, A., Bahler, L., Porter, J.: Speaker verification using randomized phrase prompting. Digital Signal Processing 1(2), 89–106 (1991)

    Article  Google Scholar 

  9. Reynolds, D.A.: Speaker identification and verification using gaussian mixture speaker models. Speech Communication 17, 91–108 (1995)

    Article  Google Scholar 

  10. Liou, H.-S., Mammone, R.J.: A subword neural tree network approach to text-dependent speaker verification. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 357–360 (May 1995)

    Google Scholar 

  11. Rosenberg, A.E., DeLong, J., Lee, C.-H., Juang, B.-H., Soong, F.K.: The use of cohort normalized scores for speaker verification. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), vol. 1, pp. 599–602 (October 1992)

    Google Scholar 

  12. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Veltchev, V., Woodland, P.: The HTK Book, Cambridge University Engineering Department (2005), http://htk.eng.cam.ac.uk/

  13. Huckvale, M.: How is individuality expressed in voice? An introduction to speech production & description for speaker classification. In: Müller, C. (ed.) Speaker Classification. LNCS/LNAI, vol. 4343, Springer, Heidelberg (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

van Heerden, C.J., Barnard, E. (2007). Durations of Context-Dependent Phonemes: A New Feature in Speaker Verification. In: Müller, C. (eds) Speaker Classification II. Lecture Notes in Computer Science(), vol 4441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74122-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74122-0_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74121-3

  • Online ISBN: 978-3-540-74122-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics