Skip to main content

Intonation Rules for Text Reading

  • Chapter
  • First Online:
Epoch Synchronous Overlap Add (ESOLA)

Part of the book series: Signals and Communication Technology ((SCT))

  • 312 Accesses

Abstract

Intonation is the cognitive aspect of the ensemble of pitch variations in the course of an utterance. This perceptual impression of speech melody correlates, to a first approximation, with changes in the fundamental frequency (F0) of the signal. This chapter presents the study of intonation patterns for text reading in Standard Colloquial Bengali for the development of rules and appropriate methods for using them in a text-to-speech synthesis system. In the model presented here, the pitch movements at the syllabic level are considered to be basic. Syllabic stylization uses the closest linear match using linear regression and t the pitch movements are expressed in semitones per second. The sentence level intonation pattern is the sequences of the word level patterns constituting the sentence. This chapter also presents the statistical method for the implementation of these obtained rule in TTS. The model is tested by synthesizing several sentences and the perceptual results are satisfactory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Agüero PD, Wimmer K, Bonafonte A (2004) Automatic analysis and synthesis of Fujisaki’s intonation model for TTS. Speech prosody 2004, Nara, Japan

    Google Scholar 

  • Cardozo BL, Ritsma RJ (1965) Short-time characteristics of periodicity of pitch. In: Commins DE (ed) Proceedings of the fifth International Congress on Acoustics, Liege, Belgium, paper B37  

    Google Scholar 

  • Chowdhury S, Datta AK, Chaudhuri BB (2000) Pitch detection algorithm using state phase analysis. J Acoust Soc India 28(1–4):247–250

    Google Scholar 

  • Chowdhury S, Datta AK, Chaudhuri BB (2001) Study of intonation patterns for text reading in standard colloquial Bengali. In: Proceedings of the Sixth International Workshop on Recent Trends in Speech, Music and Allied Signal Processing (IWSMSP), National Physical Laboratory, New Delhi, 19–21 Dec 2001, pp 56–64

    Google Scholar 

  • Chowdhury S, Datta AK, Chaudhuri BB (2002) Intonation patterns for text reading in standard colloquial Bengali. J Acoust Soc India 30:160–163

    Google Scholar 

  • Crystal D (2003) A dictionary of linguistics & phonetics, 5th edn. Blackwell Publishing, pp 326

    Google Scholar 

  • Datta AK (2017) Springer Nature

    Google Scholar 

  • Dedina MJ, Nusbaum HC (1991) PRONOUNCE: a program for pronunciation by analogy. Comput Speech Lang 5:55–64

    Article  Google Scholar 

  • Fujisaki H, Hirose K (1984) Analysis of voice fundamental frequency contours for declarative sentences of Japanese. J Acoust Soc Jpn 5(4):233–242

    Article  Google Scholar 

  • Fujisaki H, Omura T (1971) Characteristics of durations of pauses and speech segments in connected speech. Annual Report, Engineering Research Institute, Faculty of Engineering, University of Tokyo, vol 30, pp 69–74

    Google Scholar 

  • Hart J’t, Collier R, Cohen A (1990) A perceptual study of intonation, an experimental phonetic approach to speech melody. Cambridge Studies in Speech Science and Communication, Cambridge University Press, Cambridge

    Google Scholar 

  • Hiki S (1970) Control rule of the tongue movement for dynamic analog speech synthesis. J Acoust Soc Am Supplement 147:S85

    Article  Google Scholar 

  • Kaiki N, Sagisaka Y (1992) Pause characteristics and local phrase-dependency structure in Japanese. In: Proceeding ICSLP-1992, Banff, Canada, pp 357–360

    Google Scholar 

  • Klatt DH (1973) Interaction between two factors that influence vowel duration. J Acoust Soc Am 54:1102–1104

    Article  Google Scholar 

  • Das Mandal SK, Saha A, Sarkar I, Datta AK (2005) Phonological, international & prosodic aspects of concatenative speech synthesizer development for Bangla. In: Proceeding of SIMPLE 05, pp. 56–60

    Google Scholar 

  • Lee L-S, Tseng C-Y, Ouh-Young M (1989) The synthesis rules in a Chinese text-to-speech system. IEEE Trans Acous Speech Signal Process 37(9):269–285

    Google Scholar 

  • Moebius B (1995) Components of a quantitative model of German intonation. In: Proceedings of 13th International Congress of Phonetic Sciences, Stockholm, vol 2, pp 108–115

    Google Scholar 

  • Möhler G, Conkie A (1998) Parametric modeling of into nation using vector quantization. In: 3rd European Speech Communication Association (ESCA) Workshop on Speech Synthesis, Jenolan Caves, Australia

    Google Scholar 

  • Pike KL (1945) The intonation of American English. University of Michigan Press, AnnArbor, MI

    Google Scholar 

  • Pitrelli JF, Zue VW (1989) A hierarchical model for phoneme duration in American English. In Proceeding of Eurospeech-89, Paris, pp 324–327

    Google Scholar 

  • Pollack I (1968) Detection of rate of change of auditory frequency. J Exp Psychol 77:535–541

    Google Scholar 

  • Rao KS, Yegnanarayana B (2004) Modelling syllable duration in Indian languages using neural networks. In: ICASSP, pp 313–315

    Google Scholar 

  • Reichel UD (2007) Data-driven extraction of intonation contour classe. In: 6th ISCA Workshop on Speech Synthesis, Germany, pp 240–245

    Google Scholar 

  • Ritsma RJ (1965) Pitch discrimination and frequency discrimination. In: Commins DE (ed) Proceedings of the fifth International Congress on Acoustics, Liege, paper B22

    Google Scholar 

  • Roy R, Basu T, Saha A, Basu J, Das Manda Shyamal Krl (2008) Duration modeling for Bangla text to speech synthesis system. In: International Conference on Asian Language Processing 2008, Chiang Mai, Thailand, 12–14 Nov 2008

    Google Scholar 

  • Saha A, Basu T, Khan S (2008) Analysis of occurrence and duration of intra and inter sentential pauses in Bangla read out speech. In: Proceeding of Oriental COCOSDA, 2008, Kyoto, Japan, pp 53–58

    Google Scholar 

  • Sergeant RL, Harris JD (1962) Sensitivity to unidirectional frequency modulation. J Acoust Soc Am 34:1625–1628

    Google Scholar 

  • Silverman K, Beckman M, Pitrelli J, Ostendorf M, Wightman C, Price P, et al. (1992) TOBI: a standard for labeling english prosody. In: Proceedings of International Conference on Spoken Language Processing (ICSLP 92), Banff, pp 867–870

    Google Scholar 

  • Taylor P (2000) Analysis and synthesis of intonation using the Tilt model. J Acoust Soc Am 107(3):1697–1714

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Datta, A.K. (2018). Intonation Rules for Text Reading. In: Epoch Synchronous Overlap Add (ESOLA). Signals and Communication Technology. Springer, Singapore. https://doi.org/10.1007/978-981-10-7016-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7016-7_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7015-0

  • Online ISBN: 978-981-10-7016-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics