Abstract
This paper explores a unit selection based concatenative approach towards emotional speech synthesis in Hindi. The emotions explored are sad and neutral. The Festival framework is used as the underlying Text-To-Speech (TTS) system. The various steps which are followed to create a new voice in Festival are described here. The developed TTS systems are evaluated by subjective evaluation tests. These tests indicate a significant improvement in the quality of synthesis after necessary prosody modifications. Finally, possible improvements which can be made on the systems are put forward.
Chapter PDF
Similar content being viewed by others
Keywords
References
Murray, I.R., Arnott, J.L.: Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Communication 16(4), 369–390 (1995)
Narendra, N.P., Rao, K.S., Ghosh, K., Vempada, R.R., Maity, S.: Development of syllable-based text to speech synthesis system in Bengali. International Journal of Speech Technology 14(3), 167–181 (2011)
Iida, A., Campbell, N., Higuchi, F., Yasumura, M.: A corpus-based speech synthesis system with emotion. Speech Communication 40(12), 161–187 (2003)
Clark, A.J.R., Richmond, K., King, S.: Festival 2 - Build your own general purpose unit selection speech synthesiser. In: Proceedings of 5th ISCA Workshop on Speech Synthesis (2004)
Black, A.W., Taylor, P., Caley, R.: The Festival Speech Synthesis System, System documentation, edn. 1.4, for Festival Version 1.4.3 (2002)
Black, A.W., Lenzo, K.A.: Building Synthetic Voices. Language Technologies Institute, Carnegie Mellon University (2007)
King, S., Black, A.W., Taylor, P., Caley, R., Clark, R.: Edinburgh Speech Tools Library, System Documentation, edn. 1.2, for 1.2.3. Centre for Speech Technology, University of Edinburgh (2003)
Rabiner, L., Juang, B.H.: An introduction to hidden markov models. IEEE ASSP Magazine 3(1), 4–16 (1986)
Narendra, N.P., Rao, K.S.: Syllable specific unit selection cost functions for text-to-speech synthesis. ACM Transactions on Speech and Language Processing 9(3), 5:1–5:24 (2012)
Narendra, N.P., Rao, K.S.: Optimal weight tuning method for unit selection cost functions in syllable based text-to-speech synthesis. Applied Soft Computing 13(2), 773–781 (2013)
Rao, K.S., Yegnanarayana, B.: Prosody modification using instants of significant excitation. IEEE Transactions on Audio, Speech and Language Processing 14(3) (May 2006)
Rao, K.S., Prasanna, S.R.M., Yegnanarayana, B.: Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Processing Letters 14(10) (October 2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bhakat, R.K., Narendra, N.P., Sreenivasa Rao, K. (2013). Corpus Based Emotional Speech Synthesis in Hindi. In: Maji, P., Ghosh, A., Murty, M.N., Ghosh, K., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2013. Lecture Notes in Computer Science, vol 8251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45062-4_53
Download citation
DOI: https://doi.org/10.1007/978-3-642-45062-4_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45061-7
Online ISBN: 978-3-642-45062-4
eBook Packages: Computer ScienceComputer Science (R0)