Unit Selection Using Linguistic, Prosodic and Spectral Distance for Developing Text-to-Speech System in Hindi

  • K. Sreenivasa Rao
  • Sudhamay Maity
  • Amol Taru
  • Shashidhar G. Koolagudi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5909)


In this paper we propose a new method for unit selection in developing text-to-speech (TTS) system for Hindi. In the proposed method, syllables are used as basic units for concatenation. Linguistic, positional and contextual features derived from the input text are used at the first level in the unit selection process. The unit selection process is further refined by incorporating the prosodic and spectral characteristics at the utterance and syllable levels. The speech corpora considered for this task is the broadcast Hindi news read by a male speaker. Synthesized speech from the developed TTS system using multi-level unit selection criterion is evaluated using listening tests. From the evaluation results, it is observed that the synthesized speech quality has improved by refining the unit selection process using spectral and prosodic features.


Text-to-speech unit selection linguistic features prosodic features and spectral features 


  1. 1.
    Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Atlanta, Georgia, USA, May. 1996, vol. 1, pp. 373–376 (1996)Google Scholar
  2. 2.
    Yegnanarayana, B., Murthy, H.A., Sundar, R., Ramachandran, V.R., Kumar, A.S.M., Alwar, N., Rajendran, S.: Development of text-to-speech system for Indian languages. In: Proc. Int. Conf. Knowledge Based Computer Systems, Pune, India, December 1990, pp. 467–476 (1990)Google Scholar
  3. 3.
    Krishna, N.S., Murthy, H.A.: A new prosodic phrasing model for Indian language Telugu. In: INTERSPEECH 2004 - ICSLP, October 2004, vol. 1, pp. 793–796 (2004)Google Scholar
  4. 4.
    Thomas, S., Rao, M.N., Murthy, H.A., Ramalingam, C.S.: Natural sounding TTs based on syllable-like units. In: Proc. 14th European Signal Processing Conference, Florence, Italy (September 2006)Google Scholar
  5. 5.
    Kishore, S.P., Kumar, R., Sangal, R.: A data-driven synthesis approach for indian languages using syllable as basic unit. In: Int. Conf. Natural Language Processing, Mumbai, India (December 2002)Google Scholar
  6. 6.
    Sen, A., Vijaya, K.S.: Indian accent text to speech system for web browsing, Sadhana (2002)Google Scholar
  7. 7.
    Sreekanth, M., Ramakrishnan, A.G.: Festival based maiden TTS system for Tamil language. In: Proc. 3rd Language and Technology Conf., Poznan, Poland, October 2007, pp. 187–191 (2007)Google Scholar
  8. 8.
    Basu, A., Sen, D., Sen, S., Chakrabarthy, S.: An Indian language speech syn- thesizer: Techniques and its applications. In: National Systems Conference, IIT Kharagpur, Kharagpur, India (2003)Google Scholar
  9. 9.
    Rao, K.S., Yegnanarayana, B.: Modeling durations of syllables using neural networks. Computer Speech and Language 21, 282–295 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • K. Sreenivasa Rao
    • 1
  • Sudhamay Maity
    • 1
  • Amol Taru
    • 1
  • Shashidhar G. Koolagudi
    • 1
  1. 1.School of Information TechnologyIndian Institute of Technology KharagpurKharagpurIndia

Personalised recommendations