Skip to main content

Prosodic Boundary Detection

  • Chapter
Prosody: Theory and Experiment

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 14))

Abstract

Prosodic constituent structure, or the perceived grouping of words in speech, plays a role in human speech communication in virtually every language. Speakers use prosodic phrasing to contribute meaning to and sometimes disambiguate the sequence of words that comprise an utterance by highlighting its information structure. From a speech analysis perspective, prosodic phrase structure provides the link that seems to most effectively explain continuously varying acoustic correlates (pauses, FO patterns, duration lengthening, etc.) in terms of the word sequence of an utterance (syntactic, semantic and discourse structure). Just as both speakers and listeners use prosodic phrases in human speech communication, so computational models of prosodic phrase structure can be useful both for communicating meaning in synthesized speech and for extracting meaning in automatic speech understanding. In fact, prosodic phrase structure is probably even more important for computer speech processing than for humans, because computers have a much less detailed semantic representation and less extensive knowledge of the world than humans and thus word sequences tend to be more often ambiguous in computer language processing than for human listeners.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bakenecker, G., Block, U., Batliner, A., Kompe, R., Nöth, E. and Regel-Brietzmann, P. 1994. Improving parsing by incorporating `prosodic clause boundaries’ into a grammar. Proc. International Conference on Spoken Language Processing (Yokohama), vol. 3, 1115–1118.

    Google Scholar 

  • Batliner, A., Feldhaus, A., Geißler, S., Kiss, T., Kompe, R. and Nöth, E. 1996. Prosody, empty categories and parsing — a success story. Proc. International Conference on Spoken Language Processing (Philadelphia) vol. 2, 1169–1172.

    Google Scholar 

  • Bear, J. and Price, P. J. 1990. Prosody, syntax and parsing. Proc. 28th Annual Meeting, Association for Computational Linguistics, 17–22.

    Google Scholar 

  • Beckman, M. and Pierrehumbert, J. 1986. Intonational structure in Japanese and English. Phonology Yearbook 3, 255–309.

    Article  Google Scholar 

  • Beckman, M. 1995. Local shapes and global trends. Proc. XIIIth International Congress of Phonetic Sciences (Stockholm) vol. 2, 100–107.

    Google Scholar 

  • Beckman, M. 1996. The parsing of prosody, Language and Cognitive Processes II, 17–67.

    Article  Google Scholar 

  • Bruce, G. 1977. Swedish Word Accents in Sentence Perspective. Lund: Gleerups.

    Google Scholar 

  • Bruce, G. 1995. Modelling Swedish intonation for read and spontaneous speech. Proc. XIIIth International Congress of Phonetic Sciences (Stockholm) vol. 2, 28–35.

    Google Scholar 

  • Bruce, G., Granström, B., Gustafson, K., and House, D. 1993. Prosodic modelling of phrasing in Swedish. Proc. ESCA Workshop on Prosody, Working Papers 41 ( Dept. of Linguistics and Phonetics, U. of Lund ), 180–183.

    Google Scholar 

  • Butzberger, J. 1989. Statistical Methods for Intonation Pattern Recognition. Boston University M.S. Thesis.

    Google Scholar 

  • Campbell, W. N. 1993. Automatic detection of prosodic boundaries in speech. Speech Communication 13, 343–354.

    Article  Google Scholar 

  • Campbell, W.N. 1994. Combining the use of duration and FO in an automatic analysis of dialogue prosody. Proc. International Conference on Spoken Language Processing (Yokohama) vol. 3, 1111–1114.

    Google Scholar 

  • Campbell, W.N. 1997. Synthesizing spontaneous speech. In Y. Sagisaka, N. Campbell and N. Higuchi (eds.), Computing Prosody. New York: Springer, 165–186.

    Chapter  Google Scholar 

  • Dahl, D. et al. 1994. Expanding the scope of the ATIS task: the ATIS-3 corpus. Proc. ARPA Workshop on Human Language Technology, 43–48.

    Google Scholar 

  • Dilley, L., Shattuck-Hufnagel, S. and Ostendorf, M. 1996. Glottalization of vowel-initial syllables as a function of prosodic structure. Journal of Phonetics, 24, 423–444.

    Article  Google Scholar 

  • Fujisaki, H. and Kawai, H. 1988. Realization of linguistic information in the voice fundamental frequency contour of the spoken Japanese. Proc. International Conference on Acoustics, Speech and Signal Processing, 663–666.

    Google Scholar 

  • Geoffrois, E. 1993. A pitch contour analysis guided by prosodic event detection. Proc. Eurospeech (Berlin), vol. 2, 793–796.

    Google Scholar 

  • Glass, J., Chang, J. and McCandless, M. 1996. A probabilistic framework for feature-based speech recognition. Proc. International Conference on Spoken Language Processing (Philadelphia), vol. 4, 2277–2280.

    Google Scholar 

  • Godfrey, J., Holliman E., and McDaniel, J. 1992. Switchboard: Telephone speech corpus for research and development. Proc. International Conference on Acoustics, Speech and Signal Processing, vol. 1, 517–520.

    Google Scholar 

  • Gopalakrishnan, P., Bahl, L. and Mercer, R. 1995. A tree-search strategy for large vocabulary continuous speech recognition. Proc. International Conference on Acoustics, Speech and Signal Processing, vol. 1, 572–575.

    Google Scholar 

  • Hirose, K. and Fujisaki, H. 1982. Analysis and synthesis of voice fundamental frequency contours of spoken sentences. Proc. International Conference on Acoustics, Speech and Signal Processing, 950–953.

    Google Scholar 

  • Hirschberg, J. 1993. Studies of intonation and discourse. Proceedings ESCA Workshop on Prosody, Working Papers 41, ( Dept. of Linguistics and Phonetics, U. of Lund ), 90–95.

    Google Scholar 

  • Hirschberg, J. 1995. Prosodic and other acoustic cues to speaking style in spontaneous and read speech. Proc. XIIIth International Congress of Phonetic Sciences (Stockholm) vol. 2, 36–43.

    Google Scholar 

  • Horne, M., Strangert, E. and Heldner, M. 1995. Prosodic boundary strength in Swedish: final lengthening and silent interval duration. Proc. XIIIth International Congress of Phonetic Sciences (Stockholm) vol. 1, 170–173.

    Google Scholar 

  • Hunt, A. 1997. Training prosody-syntax recognition models without prosodic labels. In Y. Sagisaka, N. Campbell and N. Higuchi (eds), Computing Prosody. New York: Springer, 309–326.

    Chapter  Google Scholar 

  • Jensen, U., Moore, R., Dalsgaard, P. and Lindberg, B. 1993. Modelling of intonation contours at the sentence level using CHMMs and the 1961 O’Connor and Arnold scheme. Proc. Eurospeech 93 (Berlin), 785–788.

    Google Scholar 

  • Kompe, R., Batliner, A., Kießling, A., Kilian, U., Niemann, H., Nöth, E. and RegelBrietzmann, P. 1994. Automatic classification of prosodically marked phrase boundaries in German. Proc. International Conference on Acoustics, Speech and Signal Processing, vol. 2, 173–176.

    Google Scholar 

  • Kompe, R., Kießling, A., Niemann, H., Nöth, E., Schukat-Talamazzini, E., Zottmann, A. and Batliner, A. 1995. Prosodic scoring of word hypotheses graphs. Proc. Eurospeech 95 (Madrid), vol. 2, 1333–1336.

    Google Scholar 

  • Lari, K. and Young, S.J. 1990. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4, 35–56.

    Article  Google Scholar 

  • Macanucco, D. 1994. Automatic recognition of prosodic patterns. Unpublished Boston University course report.

    Google Scholar 

  • Mast, M., Kompe, R., Harbeck, S., Kießling, A. Niemann, H., Nöth, E., SchukatTalamazzini, E. and Warnke, V. 1996. Dialog act classification with the help of prosody. Proc. International Conference on Spoken Language Processing (Philadelphia) vol. 3, 1732–1735.

    Google Scholar 

  • Morlec, Y., Bailly, G. and Aubergé, V. 1996. Generating intonation by superposing gestures. Proc. International Conference on Spoken Language Processing (Philadelphia), vol. 1, 283–286.

    Google Scholar 

  • Nakai, M., Singer, H., Sagisaka, Y. and Shimodaira, H. 1995. Automatic prosodic segmentation by FO clustering using superpositional modeling. Proc. International Conference on Acoustics, Speech awl Signal Processing, vol. 1, 624–627.

    Google Scholar 

  • Nöth, E., De Mori, R., Fischer, J., Gebhard, A., Kompe, R., Kuhn, R., Niemann, H., and Mast 1996. An integrated model of acoustics and language using semantic classification trees. Proc. International Conference on Acoustics, Speech and Signal Processing, vol. 1, 419–422.

    Google Scholar 

  • Ostendorf, M., M. 1998. Linking speech recognition and language processing through prosody. CCAI, vol. 15, 279–303.

    Google Scholar 

  • Ostendorf, M., Kannan, A., Austin, S., Kimball, O., Schwartz, R., and Rohlicek, J.R. 1991. Integration of diverse recognition methodologies through reevaluation of N-Best sentence hypotheses. Proc. DARPA Workshop on Speech and Natural Language, 83–87.

    Google Scholar 

  • Ostendorf, M., Wightman, C. and Veilleux, M. 1993. Parse scoring with prosodic information: An analysis/synthesis approach. Computer Speech and Language, 193–210.

    Google Scholar 

  • Ostendorf, M. and Veilleux, N. 1994. A hierarchical stochastic model for automatic prediction of prosodic boundary location. Computational Linguistics, 20, 27–54.

    Google Scholar 

  • Ostendorf, M., Digalakis, V. and Kimball, O. 1996. From HMMs to segment models: A unified view of stochastic modeling for speech recognition. IEEE Trans. on Speech and Audio Proc., vol. 4, no. 5, 360–378.

    Article  Google Scholar 

  • Ostendorf, M. and Ross, K. 1997. A multi-level model for recognition of intonation labels. In Y. Sagisaka, N. Campbell and N. Higuchi (eds.) Computing Prosody. New York: Springer, 291–308.

    Chapter  Google Scholar 

  • Pierrehumbert, J. 1980. The Phonetics and Phonology of English Intonation. Ph.D. Dissertation, MIT.

    Google Scholar 

  • Pitrelli, J., Beckman, M., and Hirschberg, J. 1994. Evaluation of prosodic labeling reliability in the ToBI framework. Proc. International Conference on Spoken Language Processing (Yokohama) vol. 1, 123–126.

    Google Scholar 

  • Selkirk, E. This Volume. The interaction of constraints on prosodic phrasing.

    Google Scholar 

  • Shattuck-Hufnagel, S., Ostendorf, M. and Ross, K. 1994. Pitch accent placement within lexical items in American English. Journal of Phonetics 22, 357–388.

    Google Scholar 

  • Shattuck-Hufnagel, S. This Volume. Phrase-level phonology in speech production planning. Evidence for the role of prosodic structure.

    Google Scholar 

  • Silverman, K. Beckman, M., Pierrehumbert, J., Ostendorf, M., Wightman, C., Price, P. and Hirschberg, J. 1992. ToBI: a standard for labeling English prosody. Proc. International Conference on Spoken Language Processing (Banff) vol. 2, 867–870.

    Google Scholar 

  • ten Bosch, L. 1993. On the automatic classification of pitch movements. Proc. Eurospeech 93 (Berlin), vol. 2, 781–784.

    Google Scholar 

  • Veilleux, N. and Ostendorf, M. 1993a. Proc. International Conference on Acoustics, Speech and Signal Processing,vol.II, 51–54.

    Google Scholar 

  • Veilleux, N. and Ostendorf, M. 1993b. Prosody/parse scoring and its application in ATIS. Proc. ARPA Workshop on Human Language Technology, 335–340.

    Google Scholar 

  • Wightman, C., Ostendorf, M., Price, P. and Bear, J. 1990. The use of relative duration in syntactic disambiguation. Proc. International Conference on Spoken Language Processing, 13–16.

    Google Scholar 

  • Wightman, C., Shattuck-Hufnagel, S., Ostendorf, M. and Price, P. 1992. Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America 91, 1707–1717.

    Article  Google Scholar 

  • Wightman, C. and Ostendorf, M. 1994. Automatic labeling of prosodic patterns. IEEE Trans. on Speech and Audio Proc. 2, 469–481.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Ostendorf, M. (2000). Prosodic Boundary Detection. In: Horne, M. (eds) Prosody: Theory and Experiment. Text, Speech and Language Technology, vol 14. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-9413-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-94-015-9413-4_10

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-5562-0

  • Online ISBN: 978-94-015-9413-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics