Abstract
Intonation is the cognitive aspect of the ensemble of pitch variations in the course of an utterance. This perceptual impression of speech melody correlates, to a first approximation, with changes in the fundamental frequency (F0) of the signal. This chapter presents the study of intonation patterns for text reading in Standard Colloquial Bengali for the development of rules and appropriate methods for using them in a text-to-speech synthesis system. In the model presented here, the pitch movements at the syllabic level are considered to be basic. Syllabic stylization uses the closest linear match using linear regression and t the pitch movements are expressed in semitones per second. The sentence level intonation pattern is the sequences of the word level patterns constituting the sentence. This chapter also presents the statistical method for the implementation of these obtained rule in TTS. The model is tested by synthesizing several sentences and the perceptual results are satisfactory.
References
Agüero PD, Wimmer K, Bonafonte A (2004) Automatic analysis and synthesis of Fujisaki’s intonation model for TTS. Speech prosody 2004, Nara, Japan
Cardozo BL, Ritsma RJ (1965) Short-time characteristics of periodicity of pitch. In: Commins DE (ed) Proceedings of the fifth International Congress on Acoustics, Liege, Belgium, paper B37 Â
Chowdhury S, Datta AK, Chaudhuri BB (2000) Pitch detection algorithm using state phase analysis. J Acoust Soc India 28(1–4):247–250
Chowdhury S, Datta AK, Chaudhuri BB (2001) Study of intonation patterns for text reading in standard colloquial Bengali. In: Proceedings of the Sixth International Workshop on Recent Trends in Speech, Music and Allied Signal Processing (IWSMSP), National Physical Laboratory, New Delhi, 19–21 Dec 2001, pp 56–64
Chowdhury S, Datta AK, Chaudhuri BB (2002) Intonation patterns for text reading in standard colloquial Bengali. J Acoust Soc India 30:160–163
Crystal D (2003) A dictionary of linguistics & phonetics, 5th edn. Blackwell Publishing, pp 326
Datta AK (2017) Springer Nature
Dedina MJ, Nusbaum HC (1991) PRONOUNCE: a program for pronunciation by analogy. Comput Speech Lang 5:55–64
Fujisaki H, Hirose K (1984) Analysis of voice fundamental frequency contours for declarative sentences of Japanese. J Acoust Soc Jpn 5(4):233–242
Fujisaki H, Omura T (1971) Characteristics of durations of pauses and speech segments in connected speech. Annual Report, Engineering Research Institute, Faculty of Engineering, University of Tokyo, vol 30, pp 69–74
Hart J’t, Collier R, Cohen A (1990) A perceptual study of intonation, an experimental phonetic approach to speech melody. Cambridge Studies in Speech Science and Communication, Cambridge University Press, Cambridge
Hiki S (1970) Control rule of the tongue movement for dynamic analog speech synthesis. J Acoust Soc Am Supplement 147:S85
Kaiki N, Sagisaka Y (1992) Pause characteristics and local phrase-dependency structure in Japanese. In: Proceeding ICSLP-1992, Banff, Canada, pp 357–360
Klatt DH (1973) Interaction between two factors that influence vowel duration. J Acoust Soc Am 54:1102–1104
Das Mandal SK, Saha A, Sarkar I, Datta AK (2005) Phonological, international & prosodic aspects of concatenative speech synthesizer development for Bangla. In: Proceeding of SIMPLE 05, pp. 56–60
Lee L-S, Tseng C-Y, Ouh-Young M (1989) The synthesis rules in a Chinese text-to-speech system. IEEE Trans Acous Speech Signal Process 37(9):269–285
Moebius B (1995) Components of a quantitative model of German intonation. In: Proceedings of 13th International Congress of Phonetic Sciences, Stockholm, vol 2, pp 108–115
Möhler G, Conkie A (1998) Parametric modeling of into nation using vector quantization. In: 3rd European Speech Communication Association (ESCA) Workshop on Speech Synthesis, Jenolan Caves, Australia
Pike KL (1945) The intonation of American English. University of Michigan Press, AnnArbor, MI
Pitrelli JF, Zue VW (1989) A hierarchical model for phoneme duration in American English. In Proceeding of Eurospeech-89, Paris, pp 324–327
Pollack I (1968) Detection of rate of change of auditory frequency. J Exp Psychol 77:535–541
Rao KS, Yegnanarayana B (2004) Modelling syllable duration in Indian languages using neural networks. In: ICASSP, pp 313–315
Reichel UD (2007) Data-driven extraction of intonation contour classe. In: 6th ISCA Workshop on Speech Synthesis, Germany, pp 240–245
Ritsma RJ (1965) Pitch discrimination and frequency discrimination. In: Commins DE (ed) Proceedings of the fifth International Congress on Acoustics, Liege, paper B22
Roy R, Basu T, Saha A, Basu J, Das Manda Shyamal Krl (2008) Duration modeling for Bangla text to speech synthesis system. In: International Conference on Asian Language Processing 2008, Chiang Mai, Thailand, 12–14 Nov 2008
Saha A, Basu T, Khan S (2008) Analysis of occurrence and duration of intra and inter sentential pauses in Bangla read out speech. In: Proceeding of Oriental COCOSDA, 2008, Kyoto, Japan, pp 53–58
Sergeant RL, Harris JD (1962) Sensitivity to unidirectional frequency modulation. J Acoust Soc Am 34:1625–1628
Silverman K, Beckman M, Pitrelli J, Ostendorf M, Wightman C, Price P, et al. (1992) TOBI: a standard for labeling english prosody. In: Proceedings of International Conference on Spoken Language Processing (ICSLP 92), Banff, pp 867–870
Taylor P (2000) Analysis and synthesis of intonation using the Tilt model. J Acoust Soc Am 107(3):1697–1714
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd
About this chapter
Cite this chapter
Datta, A.K. (2018). Intonation Rules for Text Reading. In: Epoch Synchronous Overlap Add (ESOLA). Signals and Communication Technology. Springer, Singapore. https://doi.org/10.1007/978-981-10-7016-7_5
Download citation
DOI: https://doi.org/10.1007/978-981-10-7016-7_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7015-0
Online ISBN: 978-981-10-7016-7
eBook Packages: EngineeringEngineering (R0)