Corpus Based Emotional Speech Synthesis in Hindi

  • Ravi Kalyan Bhakat
  • N. P. Narendra
  • Krothapalli Sreenivasa Rao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8251)

Abstract

This paper explores a unit selection based concatenative approach towards emotional speech synthesis in Hindi. The emotions explored are sad and neutral. The Festival framework is used as the underlying Text-To-Speech (TTS) system. The various steps which are followed to create a new voice in Festival are described here. The developed TTS systems are evaluated by subjective evaluation tests. These tests indicate a significant improvement in the quality of synthesis after necessary prosody modifications. Finally, possible improvements which can be made on the systems are put forward.

Keywords

Emotional Speech Synthesis Festival Text to Speech Unit Selection Corpus based synthesis Prosody modification 

References

  1. 1.
    Murray, I.R., Arnott, J.L.: Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Communication 16(4), 369–390 (1995)CrossRefGoogle Scholar
  2. 2.
    Narendra, N.P., Rao, K.S., Ghosh, K., Vempada, R.R., Maity, S.: Development of syllable-based text to speech synthesis system in Bengali. International Journal of Speech Technology 14(3), 167–181 (2011)CrossRefGoogle Scholar
  3. 3.
    Iida, A., Campbell, N., Higuchi, F., Yasumura, M.: A corpus-based speech synthesis system with emotion. Speech Communication 40(12), 161–187 (2003)CrossRefMATHGoogle Scholar
  4. 4.
    Clark, A.J.R., Richmond, K., King, S.: Festival 2 - Build your own general purpose unit selection speech synthesiser. In: Proceedings of 5th ISCA Workshop on Speech Synthesis (2004)Google Scholar
  5. 5.
    Black, A.W., Taylor, P., Caley, R.: The Festival Speech Synthesis System, System documentation, edn. 1.4, for Festival Version 1.4.3 (2002)Google Scholar
  6. 6.
    Black, A.W., Lenzo, K.A.: Building Synthetic Voices. Language Technologies Institute, Carnegie Mellon University (2007)Google Scholar
  7. 7.
    King, S., Black, A.W., Taylor, P., Caley, R., Clark, R.: Edinburgh Speech Tools Library, System Documentation, edn. 1.2, for 1.2.3. Centre for Speech Technology, University of Edinburgh (2003)Google Scholar
  8. 8.
    Rabiner, L., Juang, B.H.: An introduction to hidden markov models. IEEE ASSP Magazine 3(1), 4–16 (1986)CrossRefGoogle Scholar
  9. 9.
    Narendra, N.P., Rao, K.S.: Syllable specific unit selection cost functions for text-to-speech synthesis. ACM Transactions on Speech and Language Processing 9(3), 5:1–5:24 (2012)Google Scholar
  10. 10.
    Narendra, N.P., Rao, K.S.: Optimal weight tuning method for unit selection cost functions in syllable based text-to-speech synthesis. Applied Soft Computing 13(2), 773–781 (2013)CrossRefGoogle Scholar
  11. 11.
    Rao, K.S., Yegnanarayana, B.: Prosody modification using instants of significant excitation. IEEE Transactions on Audio, Speech and Language Processing 14(3) (May 2006)Google Scholar
  12. 12.
    Rao, K.S., Prasanna, S.R.M., Yegnanarayana, B.: Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Processing Letters 14(10) (October 2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ravi Kalyan Bhakat
    • 1
  • N. P. Narendra
    • 1
  • Krothapalli Sreenivasa Rao
    • 1
  1. 1.Indian Institute of Technology KharagpurKharagpurIndia

Personalised recommendations