Abstract
Prosodic constituent structure, or the perceived grouping of words in speech, plays a role in human speech communication in virtually every language. Speakers use prosodic phrasing to contribute meaning to and sometimes disambiguate the sequence of words that comprise an utterance by highlighting its information structure. From a speech analysis perspective, prosodic phrase structure provides the link that seems to most effectively explain continuously varying acoustic correlates (pauses, FO patterns, duration lengthening, etc.) in terms of the word sequence of an utterance (syntactic, semantic and discourse structure). Just as both speakers and listeners use prosodic phrases in human speech communication, so computational models of prosodic phrase structure can be useful both for communicating meaning in synthesized speech and for extracting meaning in automatic speech understanding. In fact, prosodic phrase structure is probably even more important for computer speech processing than for humans, because computers have a much less detailed semantic representation and less extensive knowledge of the world than humans and thus word sequences tend to be more often ambiguous in computer language processing than for human listeners.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bakenecker, G., Block, U., Batliner, A., Kompe, R., Nöth, E. and Regel-Brietzmann, P. 1994. Improving parsing by incorporating `prosodic clause boundaries’ into a grammar. Proc. International Conference on Spoken Language Processing (Yokohama), vol. 3, 1115–1118.
Batliner, A., Feldhaus, A., Geißler, S., Kiss, T., Kompe, R. and Nöth, E. 1996. Prosody, empty categories and parsing — a success story. Proc. International Conference on Spoken Language Processing (Philadelphia) vol. 2, 1169–1172.
Bear, J. and Price, P. J. 1990. Prosody, syntax and parsing. Proc. 28th Annual Meeting, Association for Computational Linguistics, 17–22.
Beckman, M. and Pierrehumbert, J. 1986. Intonational structure in Japanese and English. Phonology Yearbook 3, 255–309.
Beckman, M. 1995. Local shapes and global trends. Proc. XIIIth International Congress of Phonetic Sciences (Stockholm) vol. 2, 100–107.
Beckman, M. 1996. The parsing of prosody, Language and Cognitive Processes II, 17–67.
Bruce, G. 1977. Swedish Word Accents in Sentence Perspective. Lund: Gleerups.
Bruce, G. 1995. Modelling Swedish intonation for read and spontaneous speech. Proc. XIIIth International Congress of Phonetic Sciences (Stockholm) vol. 2, 28–35.
Bruce, G., Granström, B., Gustafson, K., and House, D. 1993. Prosodic modelling of phrasing in Swedish. Proc. ESCA Workshop on Prosody, Working Papers 41 ( Dept. of Linguistics and Phonetics, U. of Lund ), 180–183.
Butzberger, J. 1989. Statistical Methods for Intonation Pattern Recognition. Boston University M.S. Thesis.
Campbell, W. N. 1993. Automatic detection of prosodic boundaries in speech. Speech Communication 13, 343–354.
Campbell, W.N. 1994. Combining the use of duration and FO in an automatic analysis of dialogue prosody. Proc. International Conference on Spoken Language Processing (Yokohama) vol. 3, 1111–1114.
Campbell, W.N. 1997. Synthesizing spontaneous speech. In Y. Sagisaka, N. Campbell and N. Higuchi (eds.), Computing Prosody. New York: Springer, 165–186.
Dahl, D. et al. 1994. Expanding the scope of the ATIS task: the ATIS-3 corpus. Proc. ARPA Workshop on Human Language Technology, 43–48.
Dilley, L., Shattuck-Hufnagel, S. and Ostendorf, M. 1996. Glottalization of vowel-initial syllables as a function of prosodic structure. Journal of Phonetics, 24, 423–444.
Fujisaki, H. and Kawai, H. 1988. Realization of linguistic information in the voice fundamental frequency contour of the spoken Japanese. Proc. International Conference on Acoustics, Speech and Signal Processing, 663–666.
Geoffrois, E. 1993. A pitch contour analysis guided by prosodic event detection. Proc. Eurospeech (Berlin), vol. 2, 793–796.
Glass, J., Chang, J. and McCandless, M. 1996. A probabilistic framework for feature-based speech recognition. Proc. International Conference on Spoken Language Processing (Philadelphia), vol. 4, 2277–2280.
Godfrey, J., Holliman E., and McDaniel, J. 1992. Switchboard: Telephone speech corpus for research and development. Proc. International Conference on Acoustics, Speech and Signal Processing, vol. 1, 517–520.
Gopalakrishnan, P., Bahl, L. and Mercer, R. 1995. A tree-search strategy for large vocabulary continuous speech recognition. Proc. International Conference on Acoustics, Speech and Signal Processing, vol. 1, 572–575.
Hirose, K. and Fujisaki, H. 1982. Analysis and synthesis of voice fundamental frequency contours of spoken sentences. Proc. International Conference on Acoustics, Speech and Signal Processing, 950–953.
Hirschberg, J. 1993. Studies of intonation and discourse. Proceedings ESCA Workshop on Prosody, Working Papers 41, ( Dept. of Linguistics and Phonetics, U. of Lund ), 90–95.
Hirschberg, J. 1995. Prosodic and other acoustic cues to speaking style in spontaneous and read speech. Proc. XIIIth International Congress of Phonetic Sciences (Stockholm) vol. 2, 36–43.
Horne, M., Strangert, E. and Heldner, M. 1995. Prosodic boundary strength in Swedish: final lengthening and silent interval duration. Proc. XIIIth International Congress of Phonetic Sciences (Stockholm) vol. 1, 170–173.
Hunt, A. 1997. Training prosody-syntax recognition models without prosodic labels. In Y. Sagisaka, N. Campbell and N. Higuchi (eds), Computing Prosody. New York: Springer, 309–326.
Jensen, U., Moore, R., Dalsgaard, P. and Lindberg, B. 1993. Modelling of intonation contours at the sentence level using CHMMs and the 1961 O’Connor and Arnold scheme. Proc. Eurospeech 93 (Berlin), 785–788.
Kompe, R., Batliner, A., Kießling, A., Kilian, U., Niemann, H., Nöth, E. and RegelBrietzmann, P. 1994. Automatic classification of prosodically marked phrase boundaries in German. Proc. International Conference on Acoustics, Speech and Signal Processing, vol. 2, 173–176.
Kompe, R., Kießling, A., Niemann, H., Nöth, E., Schukat-Talamazzini, E., Zottmann, A. and Batliner, A. 1995. Prosodic scoring of word hypotheses graphs. Proc. Eurospeech 95 (Madrid), vol. 2, 1333–1336.
Lari, K. and Young, S.J. 1990. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4, 35–56.
Macanucco, D. 1994. Automatic recognition of prosodic patterns. Unpublished Boston University course report.
Mast, M., Kompe, R., Harbeck, S., Kießling, A. Niemann, H., Nöth, E., SchukatTalamazzini, E. and Warnke, V. 1996. Dialog act classification with the help of prosody. Proc. International Conference on Spoken Language Processing (Philadelphia) vol. 3, 1732–1735.
Morlec, Y., Bailly, G. and Aubergé, V. 1996. Generating intonation by superposing gestures. Proc. International Conference on Spoken Language Processing (Philadelphia), vol. 1, 283–286.
Nakai, M., Singer, H., Sagisaka, Y. and Shimodaira, H. 1995. Automatic prosodic segmentation by FO clustering using superpositional modeling. Proc. International Conference on Acoustics, Speech awl Signal Processing, vol. 1, 624–627.
Nöth, E., De Mori, R., Fischer, J., Gebhard, A., Kompe, R., Kuhn, R., Niemann, H., and Mast 1996. An integrated model of acoustics and language using semantic classification trees. Proc. International Conference on Acoustics, Speech and Signal Processing, vol. 1, 419–422.
Ostendorf, M., M. 1998. Linking speech recognition and language processing through prosody. CCAI, vol. 15, 279–303.
Ostendorf, M., Kannan, A., Austin, S., Kimball, O., Schwartz, R., and Rohlicek, J.R. 1991. Integration of diverse recognition methodologies through reevaluation of N-Best sentence hypotheses. Proc. DARPA Workshop on Speech and Natural Language, 83–87.
Ostendorf, M., Wightman, C. and Veilleux, M. 1993. Parse scoring with prosodic information: An analysis/synthesis approach. Computer Speech and Language, 193–210.
Ostendorf, M. and Veilleux, N. 1994. A hierarchical stochastic model for automatic prediction of prosodic boundary location. Computational Linguistics, 20, 27–54.
Ostendorf, M., Digalakis, V. and Kimball, O. 1996. From HMMs to segment models: A unified view of stochastic modeling for speech recognition. IEEE Trans. on Speech and Audio Proc., vol. 4, no. 5, 360–378.
Ostendorf, M. and Ross, K. 1997. A multi-level model for recognition of intonation labels. In Y. Sagisaka, N. Campbell and N. Higuchi (eds.) Computing Prosody. New York: Springer, 291–308.
Pierrehumbert, J. 1980. The Phonetics and Phonology of English Intonation. Ph.D. Dissertation, MIT.
Pitrelli, J., Beckman, M., and Hirschberg, J. 1994. Evaluation of prosodic labeling reliability in the ToBI framework. Proc. International Conference on Spoken Language Processing (Yokohama) vol. 1, 123–126.
Selkirk, E. This Volume. The interaction of constraints on prosodic phrasing.
Shattuck-Hufnagel, S., Ostendorf, M. and Ross, K. 1994. Pitch accent placement within lexical items in American English. Journal of Phonetics 22, 357–388.
Shattuck-Hufnagel, S. This Volume. Phrase-level phonology in speech production planning. Evidence for the role of prosodic structure.
Silverman, K. Beckman, M., Pierrehumbert, J., Ostendorf, M., Wightman, C., Price, P. and Hirschberg, J. 1992. ToBI: a standard for labeling English prosody. Proc. International Conference on Spoken Language Processing (Banff) vol. 2, 867–870.
ten Bosch, L. 1993. On the automatic classification of pitch movements. Proc. Eurospeech 93 (Berlin), vol. 2, 781–784.
Veilleux, N. and Ostendorf, M. 1993a. Proc. International Conference on Acoustics, Speech and Signal Processing,vol.II, 51–54.
Veilleux, N. and Ostendorf, M. 1993b. Prosody/parse scoring and its application in ATIS. Proc. ARPA Workshop on Human Language Technology, 335–340.
Wightman, C., Ostendorf, M., Price, P. and Bear, J. 1990. The use of relative duration in syntactic disambiguation. Proc. International Conference on Spoken Language Processing, 13–16.
Wightman, C., Shattuck-Hufnagel, S., Ostendorf, M. and Price, P. 1992. Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America 91, 1707–1717.
Wightman, C. and Ostendorf, M. 1994. Automatic labeling of prosodic patterns. IEEE Trans. on Speech and Audio Proc. 2, 469–481.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Ostendorf, M. (2000). Prosodic Boundary Detection. In: Horne, M. (eds) Prosody: Theory and Experiment. Text, Speech and Language Technology, vol 14. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-9413-4_10
Download citation
DOI: https://doi.org/10.1007/978-94-015-9413-4_10
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5562-0
Online ISBN: 978-94-015-9413-4
eBook Packages: Springer Book Archive