Improving Speaker Recognition by Training on Emotion-Added Models

Wu, Tian; Yang, Yingchun; Wu, Zhaohui

doi:10.1007/11573548_49

Tian Wu¹⁹,
Yingchun Yang¹⁹ &
Zhaohui Wu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3784))

Included in the following conference series:

International Conference on Affective Computing and Intelligent Interaction

5059 Accesses
9 Citations

Abstract

In speaker recognition applications, the changes of emotional states are main causes of errors. The ongoing work described in this contribution attempts to enhance the performance of automatic speaker recognition (ASR) systems on emotional speech. Two procedures that only need a small quantity of affective training data are applied to ASR task, which is very practical in real-world situations. The method includes classifying the emotional states by acoustical features and generating emotion-added model based on the emotion grouping. Experimental works are performed on Emotional Prosody Speech (EPS) corpus and show significant improvement in EERs and IRs compared with baseline and comparative experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Scherer, K.R., Johnstone, T., Klasmeyer, G., Banziger, T.: Can Automatic Speaker Verification be Improved by Training the Algorithms on Emotional Speech? In: Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China (2000)
Google Scholar
Atal, B.S.: Automatic recognition of speakers from their voices. Proc. IEEE 64, 460–475 (1976)
Article Google Scholar
Scherer, K.R., Johnstone, T., Banziger, T.: Verification of emotionally stressed speakers: The problem of individual differences. In: Proc. of SPECOM 1998 (1998)
Google Scholar
Klasmeyer, G., Johnstone, T., Banziger, T., Sappok, C., Scherer, K.R.: Emotional Voice Variability in Speaker Verification. In: ISCA Workshop on Speech and Emotion (2000)
Google Scholar
Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70(3), 614–636 (1996)
Article Google Scholar
Burkhardt, F., Sendlmeier, W.F.: Verification of Acoustical Correlates of Emotional Speech using Formant-Synthesis. In: ISCA Workshop on Speech and Emotion (2000)
Google Scholar
LDC: The Linguistic Data Consortium: web pages, at http://www.ldc.upenn.edu
Schroder, M.: Emotional Speech Synthesis: A Review. In: Eurospeech 2001, vol. 1 (2001)
Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector Machine-belief Network Architecture. In: ICASSP 2004 (2004)
Google Scholar
Patterson, D.: A Linguistic Approach to Pitch Range Modelling. thesis for the degree of Doctor of Philosophy to the University of Edinburgh (2000)
Google Scholar
Sun, S.: Pitch Determination and Voice Quality Analysis Using Subharmonic-To-Harmonic Ratio. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, May 13-17 (2002)
Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, P.R. China
Tian Wu, Yingchun Yang & Zhaohui Wu

Authors

Tian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yingchun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhaohui Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences,
Jianhua Tao
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
MIT Media Laboratory, 20 Ames Street, 02139, Cambridge, MA, USA
Rosalind W. Picard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, T., Yang, Y., Wu, Z. (2005). Improving Speaker Recognition by Training on Emotion-Added Models. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_49

Download citation

DOI: https://doi.org/10.1007/11573548_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29621-8
Online ISBN: 978-3-540-32273-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics