Spontaneous Speech

Junqua, Jean-Claude; Haton, Jean-Paul

doi:10.1007/978-1-4613-1297-0_11

Jean-Claude Junqua³ &
Jean-Paul Haton⁴

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 341))

203 Accesses

Summary

We discuss the problems introduced by spontaneous speech in ASR and automatic speech understanding. After describing databases representative of spontaneous speech corpora, we stress the need for natural language understanding to address the spontaneous speech problem. This is expressed by a description of possible interfaces between ASR and natural language processing together with a section on language modeling. We conclude this chapter by presenting current trends in robust parsing and interpretation of speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Araki, M., Kawahara, T., and Doshita, S. (1993). A keyword-driven parser for spontaneous speech understanding. In International Symposium on Spoken Dialogue, pages 113–116.
Google Scholar
Bahl, L., Brown, P., de Souza, P., and Mercer, R. (1989). A tree-based statistical language model for natural language speech recognition. IEEE Trans, on ASSP, ASSP-37(7):1001–1008.S
Article Google Scholar
Bahl, L., Jelinek, F., and Mercer, R. (1983). A maximum likelihood approach to continuous speech recognition. IEEE Trans, on Pattern Analysis and Machine Intelligence, PAM-5(2):179–190.
Article Google Scholar
Bates, M. (1993). Overview of the ARPA human language technology workshop. In ARPA Human Language Technology Workshop, pages 3–6.
Google Scholar
Bernstein, J. and Danielson, D. (1992). Spontaneous speech collection for the CSR corpus. In DARPA Speech and Natural Language Workshop, pages 373–378.
Chapter Google Scholar
Bordel, G., Torres, I., and Vidal, E. (1995). QWI: A method for improved smoothing in language modelling. In ICASSP, pages 185–188.
Google Scholar
Buø, F., Polzin, T., and Waibel, A. (1994). Learning complex output representations in connectionist parsing of spoken language. In ICASSP, pages I.365-I.368.
Google Scholar
Butzberger, J., Murveit, H., Shriberg, E., and Price, P. (1992). Spontaneous speech effects in large vocabulary speech recognition applications. In DARPA Speech and Natural Language Workshop, pages 339–343.
Chapter Google Scholar
Chomsky, N. (1957). Syntactic Structures. Mouton.
Google Scholar
Della Pietra, S., Della Pietra, V., Gillet, J., Lafferty, J., Printz, H., and Ures, L. (1994). Inference and estimation of a long-range trigram model. In Carrasco, R. and Oncina, J., editors. Lectures Notes in Artificial Intelligence. Springer-Verlag.
Google Scholar
Doddington, G. (1992). CSR corpus development. In DARPA Workshop Speech and Natural Language, pages 363–366.
Chapter Google Scholar
Dumouchel, P., Gupta, V., Lennig, M., and Mermelstein, P. (1988). Three probabilistic language models for a large-vocabulary speech recognizer. In ICASSP, pages 513–516.
Google Scholar
Eckert, W. and Niemann, H. (1994). Semantic analysis in a robust spoken dialog system. In ICSLP, pages 107–110.
Google Scholar
Fu, K. (1976). Digital Pattern Recognition. Springer Verlag.
MATH Google Scholar
Giachin, E., Baggia, P., and Micca, G. (1994). Language models for spontaneous speech recognition: A bootstrap method for learning phrase bigrams. In ICSLP, pages 843–846.
Google Scholar
Gorin, A. (1994). Semantic associations, acoustic metrics and adaptive language acquisition. In ICSLP, pages 79–82.
Google Scholar
Hemphill, C., Godfrey, J., and Doddington, G. (1990). The ATIS spoken language systems pilot corpus. In DARPA Speech and Natural Language Workshop, pages 96–101.
Chapter Google Scholar
Itoh, Y., Kiyama, J., and Oka, R. (1993). Spotting partial and complete sentences for spontaneous speech. In International Symposium on Spoken Dialogue, pages 109–112.
Google Scholar
Jackson, E. (1992). Integrating two complementary approaches to spoken language understanding. In ICSLP, pages 333–336.
Google Scholar
Jackson, E., Appelt, D., Bear, J., Moore, R., and Podlozny, A. (1991). A template matcher for robust NL interpretation. In DARPA Speech and Natural Language Workshop, pages 190–194.
Chapter Google Scholar
Jain, A. N., Waibel, A., and Touretzky, D. (1992). PARSEC: A structured connectionist parsing system for spoken language. In ICASSP, pages I.205-I.208.
Google Scholar
Jelinek, F. (1990). Self-organized language modeling for speech recognition. In Waibel, A. and Lee, K.-F., editors, Readings in Speech Recognition, pages 450–506. Morgan Kaufmann.
Google Scholar
Jelinek, F., Lafferty, J., and Mercer, R. (1990). Basic methods of probabilistic context free grammars. In Speech Recognition and Understanding. Recent Advances, Trends and Applications, pages 345–360. Springer Verlag.
Google Scholar
Jelinek, F., Merialdo, B., Roukos, S., and Strauss, M. (1991). A dynamic language model for speech recognition. In DARPA Speech and Natural Language Workshop, pages 293–295.
Chapter Google Scholar
Jurafsky, D., Wooters, C., Segal, J., Stolcke, A., Fosler, E., Tajchman, G., and Morgan, N. (1995). Using a stochastic context-free grammar as a language model for speech recognition. In ICASSP, pages 189–192.
Google Scholar
Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. ASSP, ASSP-35(3):400–401.
Article Google Scholar
Kuhn, R. and De Mori, R. (1990). A cache-based natural language model for speech recognition. IEEE Trans, on Pattern Analysis and Machine Intelligence, PAMI-12(6):570–583.
Article Google Scholar
Kuhn, R. and De Mori, R. (1993). Learning speech semantics with keyword classification trees. In ICASSP, pages II.55–II.58.
Google Scholar
Lee, K.-F. (1989). Automatic Speech Recognition: The Development of the SPHINX System. Kluwer Academic Publishers.
Google Scholar
Lloyd-Thomas, H., Wright, J., and Jones, G. (1995). An integrated grammar/bigram language model using path scores. In ICASSP, pages 173–176.
Google Scholar
Madcow (1992). Multi-site data collection for a spoken language corpus. In DARPA Speech and Natural Language Workshop, pages 7–14.
Google Scholar
Mari, J.-F. and Haton, J.-P. (1994). Automatic word recognition based on second-order hidden Markov models. In ICSLP, pages 247–250.
Google Scholar
Mark, K., Miller, M., Grenander, U., and Abney, S. (1992). Parameter estimation for constrained context-free language models. In DARPA Speech and Natural Language Workshop, pages 146–149.
Chapter Google Scholar
McCandless, M. K. and Glass, J. (1994). Empirical acquisition of language models for speech recognition. In ICSLP, pages 835–838.
Google Scholar
Minsky, M. (1975). A framework for representing knowledge. In Winston, editor, The Psychology of Computer Vision, pages 211–281. McGraw-Hill, New Tork.
Google Scholar
Moore, R., Pereira, F., and Murveit, H. (1989). Integrating speech and natural-language processing. In DARPA Speech and Natural Language Workshop, pages 243–247
Chapter Google Scholar
Murakami, J. and Matsunaga, S. (1994). A spontaneous speech recognition algorithm using word trigram models and filled-pause procedure. In ICSLP, pages 819–822.
Google Scholar
Murveit, H. and Moore, R. (1990). Integrating natural language constraints into HMM-based speech recognition. In ICASSP, pages 573–576.
Google Scholar
Ney, H. (1994). A word graph algorithm for large vocabulary, continuous speech recognition. In ICSLP, pages 1355–1358.
Google Scholar
Ney, H., Essen, U., and Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language, 8:1–38.
Article Google Scholar
Niesler, T. and Woodland, P. (1995). Variable-length category-based n-grams for language modelling. Technical report, Cambridge University Engineering Department, CUED/F-INFENG/TR.215.
Google Scholar
Oerder, M. and Ney, H. (1993). Word graphs: An efficient interface between continuous-speech recognition and language understanding. In ICASSP, pages II.119–II.122.
Google Scholar
O’Shaughnessy, D. (1992). Recognition of hesitations in spontaneous speech. In ICASSP, pages 521–524.
Google Scholar
O’Shaughnessy, D. (1993). Analysis and automatic recognition of false starts in spontaneous speech. In ICASSP, pages II.724–II.727.
Google Scholar
O’Shaughnessy, D. (1994). Correcting complex false starts in spontaneous speech. In ICASSP, pages I.349–I.352.
Google Scholar
O’Shaughnessy, D. (1995). Timing patterns in fluent and disfluent spontaneous speech. In ICASSP, pages 600–603.
Google Scholar
Pallet, D., Fiscus, J., Fisher, W., and Garofolo, J. (1993). Benchmark tests for the DARPA spoken language program. In ARPA Human Language Technology Workshop, pages 7–18.
Google Scholar
Paul, D. (1989). A CSR-NL interface specification. In DARPA Speech and Natural Language Workshop, pages 203–214.
Chapter Google Scholar
Paul, D. and Baker, J. (1992). The design for the Wall Street Journal-based CSR corpus. In DARPA Speech and Natural Language Workshop, pages 357–361.
Chapter Google Scholar
Phillips, M., Glass, J., Polifroni, J., and Zue, V. (1992). Collection and analyses of WSJ-CSR data at MIT. In DARPA Speech and Natural Language Workshop, pages 367–372.
Chapter Google Scholar
Pieraccini, R., Tzoukermann, E., Gorelov, Z., Gauvain, J.-L., Levin, K, Lee, C.-H., and Wilpon, J. (1992). A speech understanding system based on statistical representation of semantics. In ICASSP, pages I.193–I.196.
Google Scholar
Price, P., Fisher, W., Bernstein, J., and Pallet, D. (1988). The DARPA 1000-word Resource Management database for continuous speech recognition. In ICASSP, pages 651–654.
Google Scholar
Rosenfeld, R. and Huang, X. (1992). Improvements in stochastic language modeling. In DARPA Speech and Natural Language Workshop, pages 107–111.
Chapter Google Scholar
Sakai, S. and Phillips, M. (1993). J-SUMMIT: Japanese spontaneous speech recognition. In EUROSPEECH, pages 2151–2154.
Google Scholar
Schultz, T. and Rogina, I. (1995). Acoustic and language modeling of human and non-human noises for human-to-human spontaneous speech recognition. In ICASSP, pages 293–296.
Google Scholar
Schwartz, R. and Austin, S. (1991). A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses. In ICASSP, pages 701–704.
Google Scholar
Schwartz, R. and Chow, Y.-L. (1989). The N-best algorithm: An efficient procedure for finding the top N sentences hypotheses. In DARPA Speech and Natural Language Workshop, pages 199–202.
Google Scholar
Seneff, S. (1992). Robust parsing for spoken language systems. In ICASSP, pages I.189–I.192.
Google Scholar
Shikano, K. (1987). Improvement of word recognition results by trigram model. In ICASSP, pages 1261–1264.
Google Scholar
Soong, F. and Huang, E. (1991). A tree-treillis based fast search for finding the N-best sentence hypotheses in continuous speech recognition. In ICASSP, pages 705–708.
Google Scholar
Stallard, D. and Bobrow, R. (1992). Fragment processing in the DELPHI system. In DARPA Speech and Natural Language Workshop, pages 305–310.
Chapter Google Scholar
Suhm, B. and Waibel, A. (1994). Towards better language models for spontaneous speech. In ICSLP, pages 831–834.
Google Scholar
Takebayashi, Y., Tsuboi, H., Sadamoto, Y., Hashimoto, H., and Shinchi, H. (1992). A real-time speech dialogue system using spontaneous speech understanding. In ICSLP, pages 651–654.
Google Scholar
Tomita, M. (1986). An efficient word lattice parsing algorithm for continuous speech recognition. In ICASSP, pages 1569–1572.
Google Scholar
Ueberla, J. (1994). Analyzing and Improving Statistical Language Models for Speech Recognition. Ph.D. thesis. Simon Fraser University.
Google Scholar
Waegner, N. and Young, S. (1992). A trellis-based language model for speech recognition. In ICSLP, pages 245–248.
Google Scholar
Ward, N. (1994). A lightweight parser for speech understanding. In ICSLP, pages 783–786.
Google Scholar
Ward, W. (1991). Understanding spontaneous speech: The PHOENIX system. In ICASSP, pages 365–368.
Google Scholar
Ward, W. and Young, S. (1993). Flexible use of semantic constraints in speech recognition. In ICASSP, pages II.49–II.50.
Google Scholar
Wolf, J. and Woods, W. (1980). The HWIM speech understanding system. In Lea, W., editor, Trends in Speech Recognition, pages 316–393. Prentice-Hall.
Google Scholar
Woszczyna, M., Aoki-Waibel, N., F.B., Coccaro, N., Horiguchi, K., Kemp, T., Lavie, A., McNair, A., Polzin, T., Rogina, I., Rose, C, Schultz, T., Suhm, B., Tomita, M., and Waibel, A. (1994). JANUS 93: Towards spontaneous speech translation. In ICASSP, pages I.345–I.348.
Google Scholar
Young, S. (1991). Using semantics to correct parser output for ATIS utterances. In DARPA Speech and Natural Language Workshop, pages 106–111.
Chapter Google Scholar
Zue, V., Glass, J., Goodine, D., Leung, H., Phillips, M., Polifroni, J., and Seneff, S. (1991). Integration of speech recognition and natural language processing in the MIT Voyager system. In ICASSP, pages 713–716.
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Technology Laboratory, USA
Jean-Claude Junqua
CRIN - INRIA, France
Jean-Paul Haton

Authors

Jean-Claude Junqua
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Paul Haton
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Junqua, JC., Haton, JP. (1996). Spontaneous Speech. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol 341. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1297-0_11

Download citation

DOI: https://doi.org/10.1007/978-1-4613-1297-0_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8555-7
Online ISBN: 978-1-4613-1297-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics