Duration Modeling Using Multi-model Based on Positional Information

  • Vempada Ramu Reddy
  • Krothapalli Sreenivasa Rao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8251)

Abstract

This paper proposes prediction of syllable durations by developing multi-models using positional information. The proposed multi-model consists of four models used for predicting the durations of syllables. Among them, one of the models is used for predicting the durations of syllables present in mono-syllabic words, and the remaining three models are meant for predicting the durations of syllables present at initial, middle and final positions of polysyllabic words. In this study, (i) linguistic constraints represented by positional, contextual and phonological features and (ii) production constraints represented by articulatory features are used for predicting the duration patterns. Feed-forward Neural Networks (FFNN) are used for developing the duration models using above mentioned features. It was found, that the prediction accuracy is improved using multi-models compared to single duration model.

Keywords

Multi-models Duration prediction Prediction accuracy Feed-forward neural networks Linguistic and Production constraints 

References

  1. 1.
    Reddy, V.R., Rao, K.S.: Better human computer interaction by enhancing the quality of text-to-speech synthesis. In: Proc. Int. Conf. Intelligent Human Computer Interaction (IHCI), IIT Kharagpur, India, pp. 1–6 (December 2012)Google Scholar
  2. 2.
    Rao, K.S., Yegnanarayana, B.: Modeling durations of syllables using neural networks. Computer Speech and Language 21, 282–295 (2007)CrossRefGoogle Scholar
  3. 3.
    Sreenivasa Rao, K., Mahadeva Prasanna, S.R., Yegnanarayana, B.: Two-stage duration model for Indian languages using neural networks. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 1179–1185. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    Reddy, V.R., Rao, K.S.: Intonation Modeling Using Linguistic, Production and Prosodic Constraints for Syllable based TTS Systems. Procedia Engineering, Elsevier 38, 2772–2783 (2012)CrossRefGoogle Scholar
  5. 5.
    Yegnanarayana, B.: Artificial Neural Networks. Prentice-Hall, New Delhi (1999)Google Scholar
  6. 6.
    Reddy, V.R., Rao, K.S.: Intonation Modeling using FFNN for Syllable based Bengali Text To Speech Synthesis. In: Proc. Int. Conf. Computer and Communication Technology, MNNIT, Allahabad, pp. 334–339 (2011)Google Scholar
  7. 7.
    Rao, K.S., Yegnanarayana, B.: Intonation modeling for Indian languages. Computer Speech and Language 23, 240–256 (2009)CrossRefGoogle Scholar
  8. 8.
    Reddy, V.R., Rao, K.S.: Two-Stage Intonation Modeling Using Feedforward Neural Networks for Syllable based Text-to-Speech Synthesis. Computer Speech and Language 27, 1105–1126 (2013)CrossRefGoogle Scholar
  9. 9.
    Ramu Reddy, V., Sreenivasa Rao, K.: Intensity Modeling for Syllable Based Text-to-Speech Synthesis. In: Parashar, M., Kaushik, D., Rana, O.F., Samtaney, R., Yang, Y., Zomaya, A. (eds.) IC3 2012. CCIS, vol. 306, pp. 106–117. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  10. 10.
    Tamura, S., Tateishi, M.: Capabilities of a Four-Layered Feedforward Neural Network: Four Layers Versus Three.  8, 251–255 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Vempada Ramu Reddy
    • 1
  • Krothapalli Sreenivasa Rao
    • 1
  1. 1.School of Information TechnologyIndian Institute of Technology KharagpurKharagpurIndia

Personalised recommendations