Online Evaluation of Text to Speech Systems for Three Social Robots

  • Fernando Alonso-MartínEmail author
  • María Malfaz
  • Álvaro Castro-González
  • José Carlos Castillo
  • Miguel A. Salichs
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11876)


The success of social robots is mainly based on their capacity for interaction with people. In this regard, verbal and non-verbal communication skills are essential for social robots to get a natural human-robot interaction. This paper focuses on the first of them since the majority of social robots implement a Text to Speech system. We present a comparative study of 8 off-the-shelf systems used in social robots where 125 participants evaluated the performance of the systems. The results show that, in general, the participants detect differences between the Text to Speech systems, being able to determine which are the more intelligible, expressive, and artificial ones. Besides, the participants also conclude that there are some systems more suitable than others depending on the physical appearance of the robots.



The research leading to these results has received funding from the projects: Development of social robots to help seniors with cognitive impairment (ROBSEN), funded by the Ministerio de Economia y Competitividad; and RoboCity2030-DIH-CM, funded by Comunidad de Madrid and co-funded by Structural Funds of the EU.


  1. 1.
    Comparison of speech synthesizers (2017).
  2. 2.
    Alonso-Martín, F., Castro-González, A., Luengo, F., Salichs, M.: Augmented robotics dialog system for enhancing human-robot interaction. Sensors 15(7), 15799–15829 (2015)CrossRefGoogle Scholar
  3. 3.
    Bakhsh, N.K., Alshomrani, S., Khan, I.: A comparative study of arabic text-to-speech synthesis systems. Int. J. Inf. Eng. Electron. Bus. 6(4), 27 (2014)Google Scholar
  4. 4.
    Dutoit, T., Pagel, V., Pierret, N., Bataille, F., Van der Vrecken, O.: The mbrola project: towards a set of high quality speech synthesizers free of use for non commercial purposes. In: Proceeding of Fourth International Conference on Spoken Language Processing, ICSLP 1996, vol. 3, pp. 1393–1396. IEEE (1996)Google Scholar
  5. 5.
    González-Pacheco, V., Castro-González, Á., Malfaz, M., Salichs, M.A.: Human-robot interaction in the MOnarCH project. In: Robocity2030 13th Workshop, pp. 1–8 (2015)Google Scholar
  6. 6.
    Handley, Z.: Is text-to-speech synthesis ready for use in computer-assisted language learning? Speech Commun. 51(10), 906–919 (2009)CrossRefGoogle Scholar
  7. 7.
    Kenmochi, H., Ohshita, H.: VOCALOID-commercial singing synthesizer based on sample concatenation. In: INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, Antwerp, Belgium, pp. 4009–4010 (2007)Google Scholar
  8. 8.
    Klatt, D.H.: Review of text-to-speech conversion for English. J. Acoust. Soc. Am. 82(3), 737 (1987)CrossRefGoogle Scholar
  9. 9.
    Lafaye, J., Gouaillier, D., Wieber, P.B.: Linear model predictive control of the locomotion of Pepper, a humanoid robot with omnidirectional wheels. In: 2014 IEEE-RAS International Conference on Humanoid Robots, pp. 336–341. IEEE (2014)Google Scholar
  10. 10.
    O’Malley, M.: Text-to-speech conversion technology. Computer 23(8), 17–23 (1990)CrossRefGoogle Scholar
  11. 11.
    Pappas, C.: Top 10 text to speech (TTS) software for elearning (2015).
  12. 12.
    Roehling, S., MacDonald, B., Watson, C.: Towards expressive speech synthesis in English on a robotic platform. In: Proceedings of the Australasian International Conference on Speech Science and Technology, pp. 130–135 (2006)Google Scholar
  13. 13.
    Salichs, E., Fernández-Rodicio, E., Castillo, J.C., Castro-González, Á., Malfaz, M., Salichs, M.Á.: A social robot assisting in cognitive stimulation therapy. In: Demazeau, Y., An, B., Bajo, J., Fernández-Caballero, A. (eds.) PAAMS 2018. LNCS (LNAI), vol. 10978, pp. 344–347. Springer, Cham (2018). Scholar
  14. 14.
    Salichs, M., et al.: Maggie: a robotic platform for human-robot social interaction. In: 2006 IEEE Conference on Robotics, Automation and Mechatronics, Bangkok, pp. 1–7. IEEE (2006)Google Scholar
  15. 15.
    Shamsuddin, S., et al.: Humanoid robot NAO: review of control and motion exploration. In: 2011 IEEE International Conference on Control System, Computing and Engineering, Penang, Malaysia, pp. 511–516 (2011)Google Scholar
  16. 16.
    Shruthi, G., et al.: Comparative study of text to speech system for Indian language. Int. J. Adv. Comput. Inf. Technol. 1, 199–209 (2012) Google Scholar
  17. 17.
    Tachibana, M., Nakaoka, S., Kenmochi, H.: A singing robot realized by a collaboration of VOCALOID and cybernetic human HRP-4C. In: Interdisciplinary Workshop on Singing Voice (InterSinging 2010), Tokyo, Japan (2010)Google Scholar
  18. 18.
    Taylor, P., Black, A.W., Caley, R.: The architecture of the festival speech synthesis system. In: The Third ESCA Workshop in Speech Synthesis, pp. 147–151 (1998). Google Scholar
  19. 19.
    Tsagarakis, N., Metta, G., Sandini, G.: iCub: the design and realization of an open humanoid platform for cognitive and neuroscience research. Adv. Robot. 21(10), 1151–1175 (2007)CrossRefGoogle Scholar
  20. 20.
    Viswanathan, M., Viswanathan, M.: Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale. Comput. Speech Lang. 19(1), 55–83 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Fernando Alonso-Martín
    • 1
    Email author
  • María Malfaz
    • 1
  • Álvaro Castro-González
    • 1
  • José Carlos Castillo
    • 1
  • Miguel A. Salichs
    • 1
  1. 1.Department of Systems Engineering and AutomationUniversidad Carlos III de MadridLeganésSpain

Personalised recommendations