Prosody Modeling for Automatic Speech Recognition and Understanding

Shriberg, Elizabeth; Stolcke, Andreas

doi:10.1007/978-1-4419-9017-4_5

Elizabeth Shriberg⁶ &
Andreas Stolcke⁶

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 138))

724 Accesses
26 Citations

Abstract

This paper summarizes statistical modeling approaches for the use of prosody (the rhythm and melody of speech) in automatic recognition and understanding of speech. We outline effective prosodic feature extraction, model architectures, and techniques to combine prosodic with lexical (word-based) information. We then survey a number of applications of the framework, and give results for automatic sentence segmentation and disfluency detection, topic segmentation, dialog act labeling, and word recognition.

The research was supported by NSF Grants IRI-9314967, IRI-9618926, and IRI-9619921, by DARPA contract no. N66001-97-C-8544, and by NASA contract no. NCC 2-1256. Additional support came from the sponsors of the 1997 CLSP Workshop [7],[11] and from the DARPA Communicator project at UW and ICSI [8]. The views herein are those of the authors and should not be interpreted as representing the policies of the funding agencies.

We thank our many colleagues at SRI, ICSI, University of Washington (formerly at Boston University), and the 1997 Johns Hopkins CLSP Summer Workshop, who were instrumental in much of the work reported here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Baron, E. Shriberg, AND A. Stolcke, Automatic punctuation and disfluency detection in multi-party meetings using prosodic and lexical cues, in Proceedings of the International Conference on Spoken Language Processing, Denver, Sept. 2002.
Google Scholar
A. Batliner, B. Möbius, G. Möhler, A. Schweitzer, AND E. Nöth, Prosodie models, automatic speech understanding, and speech synthesis: toward the common ground, in Proceedings of the 7th European Conference on Speech Communication and Technology, P. Dalsgaard, B. Lindberg, H. Benner, and Z. Tan, eds., Vol. 4, Aalborg, Denmark, Sept. 2001, pp. 2285–2288.
Google Scholar
G. DOddington, The Topic Detection and Tracking Phase 2 (TDT2) evaluation plan, in Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA, Feb. 1998, Morgan Kaufmann, pp. 223–229. Revised version available from http://www.nist.gov/speech/tests/tdt/tdt98/.
P. HEeman AND J. Allen, International boundaries, speech repairs, and discourse markers: Modeling spoken dialog, in Proceedings of the 35th Annual Meeting and 8th Conference of the European Chapter, Madrid, July 1997, Association for Computational Linguistics.
Google Scholar
J. Hirschberg AND C. Nakatani, Acoustic indicators of topic segmentation, in Proceedings of the International Conference on Spoken Language Processing, R.H. Mannell and J. Robert-Ribes, eds., Sydney, Dec. 1998, Australian Speech Science and Technology Association, pp. 976–979.
Google Scholar
M. Mast, R. Kompe, S. Harbeck, A. Kiessling, H. Niemann, E. Nöth, E.G. Schukat-talamazzini, AND V. Warnke, Dialog act classification with the help of prosody, in Proceedings of the International Conference on Spoken Language Processing, H.T. Bunnell and W. Idsardi, eds., Vol. 3, Philadelphia, Oct. 1996, pp. 1732–1735.
Google Scholar
E. Shriberg, R. Bates, A. Stolcke, P. Taylor, D. Jurafsky, K. Ries, N. Coccaro, R. Martin, M. Meteer, AND C. Van Ess-dykema, Can prosody aid the automatic classification of dialog acts in conversational speech?, Language and Speech, 41 (1998), pp. 439–487.
Google Scholar
E. Shriberg, A. Stolcke, AND D. Baron, Can prosody aid the automatic processing of multi-party meetings? Evidence from predicting punctuation, disfluencies, and overlapping speech, in Proceedings ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and Understanding, M. Bacchiani, J. Hirschberg, D. Litman, and M. Ostendorf, eds., Red Bank, NJ, Oct. 2001, pp. 139–146.
Google Scholar
E. Shriberg, A. Stolcke, D. Hakkani-tür, AND G. Tür, Prosody-based automatic segmentation of speech into sentences and topics, Speech Communication, 32 (2000), pp. 127–154. Special Issue on Accessing Information in Spoken Audio.
Article Google Scholar
K. Sönmez, E. Shriberg, L. Heck, AND M. Weintraub, Modeling dynamic prosodie variation for speaker verification, in Proceedings of the International Conference on Spoken Language Processing, R.H. Mannell and J. Robert-Ribes, eds., Vol. 7, Sydney, Dec. 1998, Australian Speech Science and Technology Association, pp. 3189–3192.
Google Scholar
A. Stolcke, K. Ries, N. Coccaro, E. Shriberg, D. Jurafsky, P. Taylor, R. Martin, C. Vaness-dykema, AND M. Meteer, Dialogue act modeling for automatic tagging and recognition of conversational speech, Computational Linguistics, 26 (2000), pp. 339–373.
Article Google Scholar
A. Stolcke, E. Shriberg, R. Bates, M. Ostendorf, D. Hakkani, M. Plauché, G. Tür, AND Y. Lu, Automatic detection of sentence boundaries and disfluencies based on recognized words, in Proceedings of the International Conference on Spoken Language Processing, R.H. Mannell and J. Robert-Ribes, eds., Vol. 5, Sydney, Dec. 1998, Australian Speech Science and Technology Association, pp. 2247–2250.
Google Scholar
A. Stolcke, E. Shriberg, D. Hakkani-Tür, AND G. Tür, Modeling the prosody of hidden events for improved word recognition, in Proceedings of the 6th European Conference on Speech Communication and Technology, Vol. 1, Budapest, Sept. 1999, pp. 307–310.
Google Scholar
P. Taylor, S. KIng, S. Isard, AND H. Wright, Intonation and dialog context as constraints for speech recognition, Language and Speech, 41 (1998), pp. 489–508.
Google Scholar
G. Tür, D. Hakkani-Tür, A. Stolcke, AND E. Shriberg, Integrating prosodic and lexical cues for automatic topic segmentation, Computational Linguistics, 27 (2001), pp. 31–57.
Article Google Scholar
N.M. Veilleux AND M. Ostendorf, Prosody/parse scoring and its applications in ATIS, in Proceedings of the ARPA Workshop on Human Language Technology, Plainsboro, NJ, Mar. 1993, pp. 335–340.
Google Scholar
J. Yamron, I. Carp, L. Gillick, S. Lowe, AND P. Van Mulbregt, A hidden Markov model approach to text segmentation and event tracking, in Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing, Vol. I, Seattle, WA, May 1998, pp. 333–336.
Google Scholar

Download references

Author information

Authors and Affiliations

SRI International, 333 Ravenswood Ave., Menlo Park, CA, 94025, USA
Elizabeth Shriberg & Andreas Stolcke

Authors

Elizabeth Shriberg
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Stolcke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Cognitive and Linguistic Studies, Brown University, Providence, RI, 02912, USA
Mark Johnson
Dept. of ECE and Dept. of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
Sanjeev P. Khudanpur
Dept. of Electrical Engineering, University of Washington, Seattle, WA, 98195, USA
Mari Ostendorf
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Roni Rosenfeld

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shriberg, E., Stolcke, A. (2004). Prosody Modeling for Automatic Speech Recognition and Understanding. In: Johnson, M., Khudanpur, S.P., Ostendorf, M., Rosenfeld, R. (eds) Mathematical Foundations of Speech and Language Processing. The IMA Volumes in Mathematics and its Applications, vol 138. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9017-4_5

Download citation

DOI: https://doi.org/10.1007/978-1-4419-9017-4_5
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-6484-2
Online ISBN: 978-1-4419-9017-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics