Skip to main content

Prosody Modeling for Automatic Speech Recognition and Understanding

  • Conference paper
Mathematical Foundations of Speech and Language Processing

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 138))

Abstract

This paper summarizes statistical modeling approaches for the use of prosody (the rhythm and melody of speech) in automatic recognition and understanding of speech. We outline effective prosodic feature extraction, model architectures, and techniques to combine prosodic with lexical (word-based) information. We then survey a number of applications of the framework, and give results for automatic sentence segmentation and disfluency detection, topic segmentation, dialog act labeling, and word recognition.

The research was supported by NSF Grants IRI-9314967, IRI-9618926, and IRI-9619921, by DARPA contract no. N66001-97-C-8544, and by NASA contract no. NCC 2-1256. Additional support came from the sponsors of the 1997 CLSP Workshop [7],[11] and from the DARPA Communicator project at UW and ICSI [8]. The views herein are those of the authors and should not be interpreted as representing the policies of the funding agencies.

We thank our many colleagues at SRI, ICSI, University of Washington (formerly at Boston University), and the 1997 Johns Hopkins CLSP Summer Workshop, who were instrumental in much of the work reported here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Baron, E. Shriberg, AND A. Stolcke, Automatic punctuation and disfluency detection in multi-party meetings using prosodic and lexical cues, in Proceedings of the International Conference on Spoken Language Processing, Denver, Sept. 2002.

    Google Scholar 

  2. A. Batliner, B. Möbius, G. Möhler, A. Schweitzer, AND E. Nöth, Prosodie models, automatic speech understanding, and speech synthesis: toward the common ground, in Proceedings of the 7th European Conference on Speech Communication and Technology, P. Dalsgaard, B. Lindberg, H. Benner, and Z. Tan, eds., Vol. 4, Aalborg, Denmark, Sept. 2001, pp. 2285–2288.

    Google Scholar 

  3. G. DOddington, The Topic Detection and Tracking Phase 2 (TDT2) evaluation plan, in Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA, Feb. 1998, Morgan Kaufmann, pp. 223–229. Revised version available from http://www.nist.gov/speech/tests/tdt/tdt98/.

  4. P. HEeman AND J. Allen, International boundaries, speech repairs, and discourse markers: Modeling spoken dialog, in Proceedings of the 35th Annual Meeting and 8th Conference of the European Chapter, Madrid, July 1997, Association for Computational Linguistics.

    Google Scholar 

  5. J. Hirschberg AND C. Nakatani, Acoustic indicators of topic segmentation, in Proceedings of the International Conference on Spoken Language Processing, R.H. Mannell and J. Robert-Ribes, eds., Sydney, Dec. 1998, Australian Speech Science and Technology Association, pp. 976–979.

    Google Scholar 

  6. M. Mast, R. Kompe, S. Harbeck, A. Kiessling, H. Niemann, E. Nöth, E.G. Schukat-talamazzini, AND V. Warnke, Dialog act classification with the help of prosody, in Proceedings of the International Conference on Spoken Language Processing, H.T. Bunnell and W. Idsardi, eds., Vol. 3, Philadelphia, Oct. 1996, pp. 1732–1735.

    Google Scholar 

  7. E. Shriberg, R. Bates, A. Stolcke, P. Taylor, D. Jurafsky, K. Ries, N. Coccaro, R. Martin, M. Meteer, AND C. Van Ess-dykema, Can prosody aid the automatic classification of dialog acts in conversational speech?, Language and Speech, 41 (1998), pp. 439–487.

    Google Scholar 

  8. E. Shriberg, A. Stolcke, AND D. Baron, Can prosody aid the automatic processing of multi-party meetings? Evidence from predicting punctuation, disfluencies, and overlapping speech, in Proceedings ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and Understanding, M. Bacchiani, J. Hirschberg, D. Litman, and M. Ostendorf, eds., Red Bank, NJ, Oct. 2001, pp. 139–146.

    Google Scholar 

  9. E. Shriberg, A. Stolcke, D. Hakkani-tür, AND G. Tür, Prosody-based automatic segmentation of speech into sentences and topics, Speech Communication, 32 (2000), pp. 127–154. Special Issue on Accessing Information in Spoken Audio.

    Article  Google Scholar 

  10. K. Sönmez, E. Shriberg, L. Heck, AND M. Weintraub, Modeling dynamic prosodie variation for speaker verification, in Proceedings of the International Conference on Spoken Language Processing, R.H. Mannell and J. Robert-Ribes, eds., Vol. 7, Sydney, Dec. 1998, Australian Speech Science and Technology Association, pp. 3189–3192.

    Google Scholar 

  11. A. Stolcke, K. Ries, N. Coccaro, E. Shriberg, D. Jurafsky, P. Taylor, R. Martin, C. Vaness-dykema, AND M. Meteer, Dialogue act modeling for automatic tagging and recognition of conversational speech, Computational Linguistics, 26 (2000), pp. 339–373.

    Article  Google Scholar 

  12. A. Stolcke, E. Shriberg, R. Bates, M. Ostendorf, D. Hakkani, M. Plauché, G. Tür, AND Y. Lu, Automatic detection of sentence boundaries and disfluencies based on recognized words, in Proceedings of the International Conference on Spoken Language Processing, R.H. Mannell and J. Robert-Ribes, eds., Vol. 5, Sydney, Dec. 1998, Australian Speech Science and Technology Association, pp. 2247–2250.

    Google Scholar 

  13. A. Stolcke, E. Shriberg, D. Hakkani-Tür, AND G. Tür, Modeling the prosody of hidden events for improved word recognition, in Proceedings of the 6th European Conference on Speech Communication and Technology, Vol. 1, Budapest, Sept. 1999, pp. 307–310.

    Google Scholar 

  14. P. Taylor, S. KIng, S. Isard, AND H. Wright, Intonation and dialog context as constraints for speech recognition, Language and Speech, 41 (1998), pp. 489–508.

    Google Scholar 

  15. G. Tür, D. Hakkani-Tür, A. Stolcke, AND E. Shriberg, Integrating prosodic and lexical cues for automatic topic segmentation, Computational Linguistics, 27 (2001), pp. 31–57.

    Article  Google Scholar 

  16. N.M. Veilleux AND M. Ostendorf, Prosody/parse scoring and its applications in ATIS, in Proceedings of the ARPA Workshop on Human Language Technology, Plainsboro, NJ, Mar. 1993, pp. 335–340.

    Google Scholar 

  17. J. Yamron, I. Carp, L. Gillick, S. Lowe, AND P. Van Mulbregt, A hidden Markov model approach to text segmentation and event tracking, in Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing, Vol. I, Seattle, WA, May 1998, pp. 333–336.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer Science+Business Media New York

About this paper

Cite this paper

Shriberg, E., Stolcke, A. (2004). Prosody Modeling for Automatic Speech Recognition and Understanding. In: Johnson, M., Khudanpur, S.P., Ostendorf, M., Rosenfeld, R. (eds) Mathematical Foundations of Speech and Language Processing. The IMA Volumes in Mathematics and its Applications, vol 138. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9017-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-9017-4_5

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4612-6484-2

  • Online ISBN: 978-1-4419-9017-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics