Evaluation of TTS Personification by GMM-Based Speaker Gender and Age Classifier

Přibil, Jiří; Přibilová, Anna; Matoušek, Jindřich

doi:10.1007/978-3-319-45510-5_35

Jiří Přibil^17,18,
Anna Přibilová¹⁹ &
Jindřich Matoušek¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9924))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1696 Accesses

Abstract

This paper describes an experiment using the Gaussian mixture models (GMM)-based speaker gender and age classification for automatic evaluation of the achieved success in text-to-speech (TTS) system personification. The proposed two-level GMM classifier detects four age categories (child, young, adult, senior) as well as it discriminates gender for adult voices. This classifier is applied for gender/age estimation of the synthetic speech in Czech and Slovak languages produced by different TTS systems with several voices, using different speech inventories and speech modelling methods. The obtained results confirm the hypothesis that this type of classifier can be utilized as an alternative approach instead of the conventional listening test in the area of speech evaluation.

The work has been done in the framework of the COST Action IC 1206 (A. Přibilová), and was supported by the Czech Science Foundation GA16-04420S (J. Matoušek, J. Přibil), and the Grant Agency of the Slovak Academy of Sciences VEGA 2/0013/14 (J. Přibil).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bocklet, T., Maier, A., Bauer, J.G., Burkhardt, F., Noth, E.: Age and gender recognition for telephone applications based on GMM supervectors and support vector machines. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1605–1608, 31 March–4 April 2008. IEEE, Las Vegas (2008)
Google Scholar
Přibil, J., Přibilová, A., Matoušek, J.: Experiment with GMM-based artefact localization in Czech synthetic speech. In: Král, P., et al. (eds.) TSD 2015. LNCS, vol. 9302, pp. 23–31. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24033-6_3
Google Scholar
Bahari, M.H., McLaren, M., van Hamme, H., van Leeuwen, D.A.: Speaker age estimation using i-vectors. Eng. Appl. Artif. Intell. 34, 99–108 (2014)
Article Google Scholar
Fairhurst, M., Erbilek, M., Da Costa-Abreu, M.: Selective review and analysis of aging effects in biometric system implementation. IEEE Trans. Hum. Mach. Syst. 45(3), 294–303 (2015)
Article Google Scholar
van Heerden, C., et al.: Combining regression and classification methods for improving automatic speaker age recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 14–19 March 2010, pp. 5174–5177. IEEE, Dallas (2010)
Google Scholar
Meinedo, H., Trancoso, I.: Age and gender classification using fusion of acoustic and prosodic features. In: Interspeech 2010, Makuhari, Japan, pp. 2822–2825, 26–30 September 2010
Google Scholar
Li, M., Han, K.J., Narayanan, S.: Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Comput. Speech Lang. 27, 151–167 (2013)
Article Google Scholar
Assmann, P., Barreda, S., Nearey, T.: Perception of speaker age in children’s voices. In: Proceedings of Meeting on Acoustics, vol. 19, 060059 (2013)
Google Scholar
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, 72–83 (1995)
Article Google Scholar
Přibil, J., Přibilová, A.: Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech. Eurasip J. Audio Speech Music Process. 2013(8), 1–22 (2013)
Google Scholar
Vích, R., Přibil, J., Smékal, Z.: New cepstral zero-pole vocal tract models for TTS synthesis. In: 2001 Proceedings of IEEE Region 8 EUROCON 2001, vol. 2, pp. 458–462 (2001)
Google Scholar
Přibil, J., Přibilová, A.: Comparison of text-independent original speaker recognition from emotionally converted speech. In: Esposito, A., et al. (eds.) Recent Advances in Nonlinear Speech Processing. Smart Innovation, Systems and Technologies, pp. 137–149. Springer, Switzerland (2016)
Google Scholar
Personal Computer Voices: PCVOX. Spektra v.d.n. http://www.pcvox.cz/pcvox/pcvox-index.html. Accessed 5 Feb 2014
Přibilová, A., Přibil, J.: Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description. Speech Commun. 48(12), 1691–1703 (2006)
Article Google Scholar
The Epos Speech Synthesis System: Open Text-To-Speech Synthesis Platform. Text-to-speech synthesis demo. http://www.speech.cz/. Accessed 10 Feb 2014
Acapela Text to Speech Demo. Acapela Group Babel Technologies SA. http://www.acapela-group.com/. Accessed 15 Feb 2016
Matoušek, J., Tihelka, D., Romportl, J.: Current state of Czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 439–446. Springer, Heidelberg (2006)
Google Scholar
Hanzlíček, Z.: Czech HMM-based speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 291–298. Springer, Heidelberg (2010)
Chapter Google Scholar
Interactive TTS Demo. SpeechTech, s.r.o. http://www.speechtech.cz/cz/demo-tts#Iva210. Accessed 17 Feb 2010
Nabney, I.T.: Netlab Pattern Analysis Toolbox. http://www.mathworks.com/matlabcentral/fileexchange/2654-netlab. Accessed 2 Oct 2013

Download references

Author information

Authors and Affiliations

Faculty of Applied Sciences, Department of Cybernetics, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Jiří Přibil & Jindřich Matoušek
SAS, Institute of Measurement Science, Dúbravská cesta 9, 841 04, Bratislava, Slovakia
Jiří Přibil
Faculty of Electrical Engineering and Information Technology, Slovak University of Technology in Bratislava, Ilkovičova 3, 812 19, Bratislava, Slovakia
Anna Přibilová

Authors

Jiří Přibil
View author publications
You can also search for this author in PubMed Google Scholar
Anna Přibilová
View author publications
You can also search for this author in PubMed Google Scholar
Jindřich Matoušek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiří Přibil .

Editor information

Editors and Affiliations

Masaryk University , Brno, Czech Republic
Petr Sojka
Masaryk University , Brno, Czech Republic
Aleš Horák
Masaryk University , Brno, Czech Republic
Ivan Kopeček
Masaryk University , Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Přibil, J., Přibilová, A., Matoušek, J. (2016). Evaluation of TTS Personification by GMM-Based Speaker Gender and Age Classifier. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-45510-5_35
Published: 03 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45509-9
Online ISBN: 978-3-319-45510-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics