Durations of Context-Dependent Phonemes: A New Feature in Speaker Verification

van Heerden, Charl Johannes; Barnard, Etienne

doi:10.1007/978-3-540-74122-0_9

Durations of Context-Dependent Phonemes: A New Feature in Speaker Verification

Charl Johannes van Heerden^1,2 &
Etienne Barnard^1,2

Chapter

1219 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4441))

Abstract

We present a text-dependent speaker verification system based on Hidden Markov Models. A set of features, based on the temporal duration of context-dependent phonemes, is used in order to distinguish amongst speakers. Our approach was tested using the YOHO corpus; it was found that the HMM-based system achieved an equal error rate (EER) of 0.68% using conventional (acoustic) features and an EER of 0.32% when the time features were combined with the acoustic features. This compares well with state-of-the-art results on the same test, and shows the value of the temporal features for speaker verification. These features may also be useful for other purposes, such as the detection of replay attacks, or for improving the robustness of speaker-verification systems to channel or speaker variations. Our results confirm earlier findings obtained on text-independent speaker recognition [1] and text-dependent speaker verification [2] tasks, and contain a number of suggestions on further possible improvements.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ferrer, L., Bratt, H., Gadde, V.R.R., Kajarekar, S., Shriberg, E., Sönmez, K., Stolcke, A., Venkataraman, A.: Modeling duration patterns for speaker recognition. In: Proceedings of Eurospeech, pp. 784–787 (September 2003)
Google Scholar
van Heerden, C.J., Barnard, E.: Using timing information in speaker verification. In: Proceedings of the Symposium of the Pattern Recognition Association of South Africa, pp. 53–57 (December 2005)
Google Scholar
Campbell, J.P.: Speaker recognition: A tutorial. Proceedings of the IEEE 85, 1437–1462 (1997)
Article Google Scholar
Martin, A.: Evaluations of Automatic Speaker Classification Systems. In: Müller, C. (ed.) Speaker Classification. Lecture Notes in Computer Science / Artificial Intelligence, vol. 4343, Springer, Heidelberg (2007) (this issue)
Chapter Google Scholar
Koreman, J., Wu, D., Morris, A.C.: Enhancing Speaker Discrimination at the Feature Level. In: Müller, C. (ed.) Speaker Classification. Lecture Notes in Computer Science / Artificial Intelligence, vol. 4343, Springer, Heidelberg (2007) (this issue)
Chapter Google Scholar
Campbell, J.: Testing with the YOHO CD-ROM voice verification corpus. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 341–344 (May 1995)
Google Scholar
Campbell, J.P., Reynolds, D.A.: Corpora for the evaluation of speaker recognition systems. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 829–832 (March 1999)
Google Scholar
Higgens, A., Bahler, L., Porter, J.: Speaker verification using randomized phrase prompting. Digital Signal Processing 1(2), 89–106 (1991)
Article Google Scholar
Reynolds, D.A.: Speaker identification and verification using gaussian mixture speaker models. Speech Communication 17, 91–108 (1995)
Article Google Scholar
Liou, H.-S., Mammone, R.J.: A subword neural tree network approach to text-dependent speaker verification. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 357–360 (May 1995)
Google Scholar
Rosenberg, A.E., DeLong, J., Lee, C.-H., Juang, B.-H., Soong, F.K.: The use of cohort normalized scores for speaker verification. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), vol. 1, pp. 599–602 (October 1992)
Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Veltchev, V., Woodland, P.: The HTK Book, Cambridge University Engineering Department (2005), http://htk.eng.cam.ac.uk/
Huckvale, M.: How is individuality expressed in voice? An introduction to speech production & description for speaker classification. In: Müller, C. (ed.) Speaker Classification. LNCS/LNAI, vol. 4343, Springer, Heidelberg (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Pretoria, Pretoria Gauteng, South Africa
Charl Johannes van Heerden & Etienne Barnard
Human Language Technology Group, Meraka Institute, CSIR, Meiring Naude Rd, Brumeria, Pretoria Gauteng, South Africa
Charl Johannes van Heerden & Etienne Barnard

Authors

Charl Johannes van Heerden
View author publications
You can also search for this author in PubMed Google Scholar
Etienne Barnard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

van Heerden, C.J., Barnard, E. (2007). Durations of Context-Dependent Phonemes: A New Feature in Speaker Verification. In: Müller, C. (eds) Speaker Classification II. Lecture Notes in Computer Science(), vol 4441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74122-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-74122-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74121-3
Online ISBN: 978-3-540-74122-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics