Abstract
Speaker-emotion variability is one of the major factors causing the degradation of the performance of speaker recognition system. The difficulty is mainly induced by the shift of the acoustic space, thus the emotional model could not be generated only by neutral utterances. This paper presents a translated learning method which utilizes both the neutral and emotional speech in the development data as translators to build “bridges” between neutral model space and emotional model space. With the help of these translators, GMM emotional model can be produced through its neutral model. The experiments carried on MASC show an IR increase of 2.81% over the GMM-UBM system.
Thanks to 973 Program 2013CB329504, the Fundamental Research Funds for the Central Universities 2013 and National Natural Science Foundation of China (NSFC60970080) for funding.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bao, H., Xu, M., Zheng, T.F.: Emotion Attribute Projection for Speaker Recognition on Emotional Speech. In: Interspeech, pp. 758–761 (2007)
Huang, T., Yang, Y.: Applying pitch-dependent difference detection and modification to emotional speaker recognition. In: Interspeech, pp. 2751-2754 (2008)
Shan, Z., Yang, Y.: Natural-Emotion GMM Transformation Algorithm for Emotional Speaker Recognition. In: Interspeech, pp.782-785 (2007)
Shan, Z., Yang, Y.: Learning Polynomial Function Based Neutral-Emotion GMM Transformation for Emotional Speaker Recognition. In: ICPR 2008, vol. 1(4), pp. 8–11 (December 2008)
Dai, W., Chen, Y., Xue, G., Yang, Q., Yu, Y.: Translated Learning: Transfer Learning across Different Feature Space. In: Proc. Of NIPS (2008)
Dai, W., Yang, Q., Xue, G.-R., Yu, Y.: Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine Learning, Corvalis, Oregon, June 20-24, pp. 193–200 (2007)
Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing 3(1), 72–83 (1995)
Moreno, P., Ho, P., Vasconcelos, N.: A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications. In: NIPS, Vancouver (December 2003)
Wu, T., Yang, Y., Wu, Z., Li, D.: MASC:A Speech Corpus in Mandarin for Emotion Analysis and Affective Speaker Recognition. In: ODYSSEY 2006, pp. 1–5 (June 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, L., Yang, Y. (2013). Emotional Speaker Recognition Based on Model Space Migration through Translated Learning. In: Sun, Z., Shan, S., Yang, G., Zhou, J., Wang, Y., Yin, Y. (eds) Biometric Recognition. CCBR 2013. Lecture Notes in Computer Science, vol 8232. Springer, Cham. https://doi.org/10.1007/978-3-319-02961-0_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-02961-0_49
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02960-3
Online ISBN: 978-3-319-02961-0
eBook Packages: Computer ScienceComputer Science (R0)