Skip to main content

Intensity Modeling for Syllable Based Text-to-Speech Synthesis

  • Conference paper
Contemporary Computing (IC3 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 306))

Included in the following conference series:

Abstract

The quality of text-to-speech (TTS) synthesis systems can be improved by controlling the intensities of speech segments in addition to durations and intonation. This paper proposes linguistic and production constraints for modeling the intensity patterns of sequence of syllables. Linguistic constraints are represented by positional, contextual and phonological features, and production constraints are represented by articulatory features associated to syllables. In this work, feedforward neural network (FFNN) is proposed to model the intensities of syllables. The proposed FFNN model is evaluated by means of objective measures such as average prediction error (μ), standard deviation (σ), correlation coefficient (γ X,Y ) and the percentage of syllables predicted within different deviations. The prediction performance of the proposed model is compared with other statistical models such as Linear Regression (LR) and Classification and Regression Tree (CART) models. The models are also evaluated by means of subjective listening tests on the synthesized speech generated by incorporating the predicted syllable intensities in Bengali TTS system. From the evaluation studies, it is observed that prediction accuracy is better for FFNN models, compared to other models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jilka, M., Mohler, G., Dogil, G.: Rules for generation of TOBI-based American English intonation. Speech Communication 28, 83–108 (1999)

    Article  Google Scholar 

  2. Reddy, V.R., Rao, K.S.: Intonation Modeling using FFNN for Syllable based Bengali Text To Speech Synthesis. In: Proc. Int. Conf. Computer and Communication Technology, MNNIT, Allahabad, pp. 334–339 (2011)

    Google Scholar 

  3. Klatt, D.H.: Synthesis by rule of segmental durations in English sentences. In: Lindblom, B., Ohman, S. (eds.) Frontiers of Speech Communication Research, pp. 287–300. Academic Press, New York (1979)

    Google Scholar 

  4. Rao, K.S., Yegnanarayana, B.: Modeling durations of syllables using neural networks. Computer Speech and Language 21, 282–295 (2007)

    Article  Google Scholar 

  5. Mannel, R.H.: Modelling of the segmental and prosodic aspects of speech intensity in synthetic speech. In: Proc. Int. Conf. Speech Science and Technology, Melbourne, pp. 538–543 (December 2002)

    Google Scholar 

  6. Tesser, F.: Emotional Speech Synthesis: from theory to application. PhD thesis, International Doctorate School in Information and Communication Technologies. DIT - University of Trento, Italy (February 2005)

    Google Scholar 

  7. Narendra, N.P., Rao, K.S., Ghosh, K., Reddy, V.R., Maity, S.: Development of syllable-based text to speech synthesis system in Bengali. Int. J. of Speech Technology 14(3), 167–181 (2011)

    Article  Google Scholar 

  8. Haykin, S.: Neural Networks: A Comprehensive Foundation. Pearson Education Aisa, Inc., New Delhi (1999)

    MATH  Google Scholar 

  9. I. P. Association, Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press (1999)

    Google Scholar 

  10. Tamura, S., Tateishi, M.: Capabilities of a Four-Layered Feedforward Neural Network: Four Layers Versus Three, vol. 8, pp. 251–255 (March 1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ramu Reddy, V., Sreenivasa Rao, K. (2012). Intensity Modeling for Syllable Based Text-to-Speech Synthesis. In: Parashar, M., Kaushik, D., Rana, O.F., Samtaney, R., Yang, Y., Zomaya, A. (eds) Contemporary Computing. IC3 2012. Communications in Computer and Information Science, vol 306. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32129-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32129-0_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32128-3

  • Online ISBN: 978-3-642-32129-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics