Skip to main content

Evaluation of TTS Personification by GMM-Based Speaker Gender and Age Classifier

  • Conference paper
  • First Online:
Book cover Text, Speech, and Dialogue (TSD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9924))

Included in the following conference series:

  • 1696 Accesses

Abstract

This paper describes an experiment using the Gaussian mixture models (GMM)-based speaker gender and age classification for automatic evaluation of the achieved success in text-to-speech (TTS) system personification. The proposed two-level GMM classifier detects four age categories (child, young, adult, senior) as well as it discriminates gender for adult voices. This classifier is applied for gender/age estimation of the synthetic speech in Czech and Slovak languages produced by different TTS systems with several voices, using different speech inventories and speech modelling methods. The obtained results confirm the hypothesis that this type of classifier can be utilized as an alternative approach instead of the conventional listening test in the area of speech evaluation.

The work has been done in the framework of the COST Action IC 1206 (A. Přibilová), and was supported by the Czech Science Foundation GA16-04420S (J. Matoušek, J. Přibil), and the Grant Agency of the Slovak Academy of Sciences VEGA 2/0013/14 (J. Přibil).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bocklet, T., Maier, A., Bauer, J.G., Burkhardt, F., Noth, E.: Age and gender recognition for telephone applications based on GMM supervectors and support vector machines. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1605–1608, 31 March–4 April 2008. IEEE, Las Vegas (2008)

    Google Scholar 

  2. Přibil, J., Přibilová, A., Matoušek, J.: Experiment with GMM-based artefact localization in Czech synthetic speech. In: Král, P., et al. (eds.) TSD 2015. LNCS, vol. 9302, pp. 23–31. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24033-6_3

    Google Scholar 

  3. Bahari, M.H., McLaren, M., van Hamme, H., van Leeuwen, D.A.: Speaker age estimation using i-vectors. Eng. Appl. Artif. Intell. 34, 99–108 (2014)

    Article  Google Scholar 

  4. Fairhurst, M., Erbilek, M., Da Costa-Abreu, M.: Selective review and analysis of aging effects in biometric system implementation. IEEE Trans. Hum. Mach. Syst. 45(3), 294–303 (2015)

    Article  Google Scholar 

  5. van Heerden, C., et al.: Combining regression and classification methods for improving automatic speaker age recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 14–19 March 2010, pp. 5174–5177. IEEE, Dallas (2010)

    Google Scholar 

  6. Meinedo, H., Trancoso, I.: Age and gender classification using fusion of acoustic and prosodic features. In: Interspeech 2010, Makuhari, Japan, pp. 2822–2825, 26–30 September 2010

    Google Scholar 

  7. Li, M., Han, K.J., Narayanan, S.: Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Comput. Speech Lang. 27, 151–167 (2013)

    Article  Google Scholar 

  8. Assmann, P., Barreda, S., Nearey, T.: Perception of speaker age in children’s voices. In: Proceedings of Meeting on Acoustics, vol. 19, 060059 (2013)

    Google Scholar 

  9. Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, 72–83 (1995)

    Article  Google Scholar 

  10. Přibil, J., Přibilová, A.: Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech. Eurasip J. Audio Speech Music Process. 2013(8), 1–22 (2013)

    Google Scholar 

  11. Vích, R., Přibil, J., Smékal, Z.: New cepstral zero-pole vocal tract models for TTS synthesis. In: 2001 Proceedings of IEEE Region 8 EUROCON 2001, vol. 2, pp. 458–462 (2001)

    Google Scholar 

  12. Přibil, J., Přibilová, A.: Comparison of text-independent original speaker recognition from emotionally converted speech. In: Esposito, A., et al. (eds.) Recent Advances in Nonlinear Speech Processing. Smart Innovation, Systems and Technologies, pp. 137–149. Springer, Switzerland (2016)

    Google Scholar 

  13. Personal Computer Voices: PCVOX. Spektra v.d.n. http://www.pcvox.cz/pcvox/pcvox-index.html. Accessed 5 Feb 2014

  14. Přibilová, A., Přibil, J.: Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description. Speech Commun. 48(12), 1691–1703 (2006)

    Article  Google Scholar 

  15. The Epos Speech Synthesis System: Open Text-To-Speech Synthesis Platform. Text-to-speech synthesis demo. http://www.speech.cz/. Accessed 10 Feb 2014

  16. Acapela Text to Speech Demo. Acapela Group Babel Technologies SA. http://www.acapela-group.com/. Accessed 15 Feb 2016

  17. Matoušek, J., Tihelka, D., Romportl, J.: Current state of Czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 439–446. Springer, Heidelberg (2006)

    Google Scholar 

  18. Hanzlíček, Z.: Czech HMM-based speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 291–298. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  19. Interactive TTS Demo. SpeechTech, s.r.o. http://www.speechtech.cz/cz/demo-tts#Iva210. Accessed 17 Feb 2010

  20. Nabney, I.T.: Netlab Pattern Analysis Toolbox. http://www.mathworks.com/matlabcentral/fileexchange/2654-netlab. Accessed 2 Oct 2013

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiří Přibil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Přibil, J., Přibilová, A., Matoušek, J. (2016). Evaluation of TTS Personification by GMM-Based Speaker Gender and Age Classifier. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45510-5_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45509-9

  • Online ISBN: 978-3-319-45510-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics