Abstract
This paper describes an experiment using the Gaussian mixture models (GMM)-based speaker gender and age classification for automatic evaluation of the achieved success in text-to-speech (TTS) system personification. The proposed two-level GMM classifier detects four age categories (child, young, adult, senior) as well as it discriminates gender for adult voices. This classifier is applied for gender/age estimation of the synthetic speech in Czech and Slovak languages produced by different TTS systems with several voices, using different speech inventories and speech modelling methods. The obtained results confirm the hypothesis that this type of classifier can be utilized as an alternative approach instead of the conventional listening test in the area of speech evaluation.
The work has been done in the framework of the COST Action IC 1206 (A. Přibilová), and was supported by the Czech Science Foundation GA16-04420S (J. Matoušek, J. Přibil), and the Grant Agency of the Slovak Academy of Sciences VEGA 2/0013/14 (J. Přibil).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bocklet, T., Maier, A., Bauer, J.G., Burkhardt, F., Noth, E.: Age and gender recognition for telephone applications based on GMM supervectors and support vector machines. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1605–1608, 31 March–4 April 2008. IEEE, Las Vegas (2008)
Přibil, J., Přibilová, A., Matoušek, J.: Experiment with GMM-based artefact localization in Czech synthetic speech. In: Král, P., et al. (eds.) TSD 2015. LNCS, vol. 9302, pp. 23–31. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24033-6_3
Bahari, M.H., McLaren, M., van Hamme, H., van Leeuwen, D.A.: Speaker age estimation using i-vectors. Eng. Appl. Artif. Intell. 34, 99–108 (2014)
Fairhurst, M., Erbilek, M., Da Costa-Abreu, M.: Selective review and analysis of aging effects in biometric system implementation. IEEE Trans. Hum. Mach. Syst. 45(3), 294–303 (2015)
van Heerden, C., et al.: Combining regression and classification methods for improving automatic speaker age recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 14–19 March 2010, pp. 5174–5177. IEEE, Dallas (2010)
Meinedo, H., Trancoso, I.: Age and gender classification using fusion of acoustic and prosodic features. In: Interspeech 2010, Makuhari, Japan, pp. 2822–2825, 26–30 September 2010
Li, M., Han, K.J., Narayanan, S.: Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Comput. Speech Lang. 27, 151–167 (2013)
Assmann, P., Barreda, S., Nearey, T.: Perception of speaker age in children’s voices. In: Proceedings of Meeting on Acoustics, vol. 19, 060059 (2013)
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, 72–83 (1995)
Přibil, J., Přibilová, A.: Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech. Eurasip J. Audio Speech Music Process. 2013(8), 1–22 (2013)
Vích, R., Přibil, J., Smékal, Z.: New cepstral zero-pole vocal tract models for TTS synthesis. In: 2001 Proceedings of IEEE Region 8 EUROCON 2001, vol. 2, pp. 458–462 (2001)
Přibil, J., Přibilová, A.: Comparison of text-independent original speaker recognition from emotionally converted speech. In: Esposito, A., et al. (eds.) Recent Advances in Nonlinear Speech Processing. Smart Innovation, Systems and Technologies, pp. 137–149. Springer, Switzerland (2016)
Personal Computer Voices: PCVOX. Spektra v.d.n. http://www.pcvox.cz/pcvox/pcvox-index.html. Accessed 5 Feb 2014
Přibilová, A., Přibil, J.: Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description. Speech Commun. 48(12), 1691–1703 (2006)
The Epos Speech Synthesis System: Open Text-To-Speech Synthesis Platform. Text-to-speech synthesis demo. http://www.speech.cz/. Accessed 10 Feb 2014
Acapela Text to Speech Demo. Acapela Group Babel Technologies SA. http://www.acapela-group.com/. Accessed 15 Feb 2016
Matoušek, J., Tihelka, D., Romportl, J.: Current state of Czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 439–446. Springer, Heidelberg (2006)
Hanzlíček, Z.: Czech HMM-based speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 291–298. Springer, Heidelberg (2010)
Interactive TTS Demo. SpeechTech, s.r.o. http://www.speechtech.cz/cz/demo-tts#Iva210. Accessed 17 Feb 2010
Nabney, I.T.: Netlab Pattern Analysis Toolbox. http://www.mathworks.com/matlabcentral/fileexchange/2654-netlab. Accessed 2 Oct 2013
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Přibil, J., Přibilová, A., Matoušek, J. (2016). Evaluation of TTS Personification by GMM-Based Speaker Gender and Age Classifier. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-45510-5_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45509-9
Online ISBN: 978-3-319-45510-5
eBook Packages: Computer ScienceComputer Science (R0)