Skip to main content

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 341))

  • 203 Accesses

Summary

We discuss the problems introduced by spontaneous speech in ASR and automatic speech understanding. After describing databases representative of spontaneous speech corpora, we stress the need for natural language understanding to address the spontaneous speech problem. This is expressed by a description of possible interfaces between ASR and natural language processing together with a section on language modeling. We conclude this chapter by presenting current trends in robust parsing and interpretation of speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Araki, M., Kawahara, T., and Doshita, S. (1993). A keyword-driven parser for spontaneous speech understanding. In International Symposium on Spoken Dialogue, pages 113–116.

    Google Scholar 

  • Bahl, L., Brown, P., de Souza, P., and Mercer, R. (1989). A tree-based statistical language model for natural language speech recognition. IEEE Trans, on ASSP, ASSP-37(7):1001–1008.S

    Article  Google Scholar 

  • Bahl, L., Jelinek, F., and Mercer, R. (1983). A maximum likelihood approach to continuous speech recognition. IEEE Trans, on Pattern Analysis and Machine Intelligence, PAM-5(2):179–190.

    Article  Google Scholar 

  • Bates, M. (1993). Overview of the ARPA human language technology workshop. In ARPA Human Language Technology Workshop, pages 3–6.

    Google Scholar 

  • Bernstein, J. and Danielson, D. (1992). Spontaneous speech collection for the CSR corpus. In DARPA Speech and Natural Language Workshop, pages 373–378.

    Chapter  Google Scholar 

  • Bordel, G., Torres, I., and Vidal, E. (1995). QWI: A method for improved smoothing in language modelling. In ICASSP, pages 185–188.

    Google Scholar 

  • Buø, F., Polzin, T., and Waibel, A. (1994). Learning complex output representations in connectionist parsing of spoken language. In ICASSP, pages I.365-I.368.

    Google Scholar 

  • Butzberger, J., Murveit, H., Shriberg, E., and Price, P. (1992). Spontaneous speech effects in large vocabulary speech recognition applications. In DARPA Speech and Natural Language Workshop, pages 339–343.

    Chapter  Google Scholar 

  • Chomsky, N. (1957). Syntactic Structures. Mouton.

    Google Scholar 

  • Della Pietra, S., Della Pietra, V., Gillet, J., Lafferty, J., Printz, H., and Ures, L. (1994). Inference and estimation of a long-range trigram model. In Carrasco, R. and Oncina, J., editors. Lectures Notes in Artificial Intelligence. Springer-Verlag.

    Google Scholar 

  • Doddington, G. (1992). CSR corpus development. In DARPA Workshop Speech and Natural Language, pages 363–366.

    Chapter  Google Scholar 

  • Dumouchel, P., Gupta, V., Lennig, M., and Mermelstein, P. (1988). Three probabilistic language models for a large-vocabulary speech recognizer. In ICASSP, pages 513–516.

    Google Scholar 

  • Eckert, W. and Niemann, H. (1994). Semantic analysis in a robust spoken dialog system. In ICSLP, pages 107–110.

    Google Scholar 

  • Fu, K. (1976). Digital Pattern Recognition. Springer Verlag.

    MATH  Google Scholar 

  • Giachin, E., Baggia, P., and Micca, G. (1994). Language models for spontaneous speech recognition: A bootstrap method for learning phrase bigrams. In ICSLP, pages 843–846.

    Google Scholar 

  • Gorin, A. (1994). Semantic associations, acoustic metrics and adaptive language acquisition. In ICSLP, pages 79–82.

    Google Scholar 

  • Hemphill, C., Godfrey, J., and Doddington, G. (1990). The ATIS spoken language systems pilot corpus. In DARPA Speech and Natural Language Workshop, pages 96–101.

    Chapter  Google Scholar 

  • Itoh, Y., Kiyama, J., and Oka, R. (1993). Spotting partial and complete sentences for spontaneous speech. In International Symposium on Spoken Dialogue, pages 109–112.

    Google Scholar 

  • Jackson, E. (1992). Integrating two complementary approaches to spoken language understanding. In ICSLP, pages 333–336.

    Google Scholar 

  • Jackson, E., Appelt, D., Bear, J., Moore, R., and Podlozny, A. (1991). A template matcher for robust NL interpretation. In DARPA Speech and Natural Language Workshop, pages 190–194.

    Chapter  Google Scholar 

  • Jain, A. N., Waibel, A., and Touretzky, D. (1992). PARSEC: A structured connectionist parsing system for spoken language. In ICASSP, pages I.205-I.208.

    Google Scholar 

  • Jelinek, F. (1990). Self-organized language modeling for speech recognition. In Waibel, A. and Lee, K.-F., editors, Readings in Speech Recognition, pages 450–506. Morgan Kaufmann.

    Google Scholar 

  • Jelinek, F., Lafferty, J., and Mercer, R. (1990). Basic methods of probabilistic context free grammars. In Speech Recognition and Understanding. Recent Advances, Trends and Applications, pages 345–360. Springer Verlag.

    Google Scholar 

  • Jelinek, F., Merialdo, B., Roukos, S., and Strauss, M. (1991). A dynamic language model for speech recognition. In DARPA Speech and Natural Language Workshop, pages 293–295.

    Chapter  Google Scholar 

  • Jurafsky, D., Wooters, C., Segal, J., Stolcke, A., Fosler, E., Tajchman, G., and Morgan, N. (1995). Using a stochastic context-free grammar as a language model for speech recognition. In ICASSP, pages 189–192.

    Google Scholar 

  • Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. ASSP, ASSP-35(3):400–401.

    Article  Google Scholar 

  • Kuhn, R. and De Mori, R. (1990). A cache-based natural language model for speech recognition. IEEE Trans, on Pattern Analysis and Machine Intelligence, PAMI-12(6):570–583.

    Article  Google Scholar 

  • Kuhn, R. and De Mori, R. (1993). Learning speech semantics with keyword classification trees. In ICASSP, pages II.55–II.58.

    Google Scholar 

  • Lee, K.-F. (1989). Automatic Speech Recognition: The Development of the SPHINX System. Kluwer Academic Publishers.

    Google Scholar 

  • Lloyd-Thomas, H., Wright, J., and Jones, G. (1995). An integrated grammar/bigram language model using path scores. In ICASSP, pages 173–176.

    Google Scholar 

  • Madcow (1992). Multi-site data collection for a spoken language corpus. In DARPA Speech and Natural Language Workshop, pages 7–14.

    Google Scholar 

  • Mari, J.-F. and Haton, J.-P. (1994). Automatic word recognition based on second-order hidden Markov models. In ICSLP, pages 247–250.

    Google Scholar 

  • Mark, K., Miller, M., Grenander, U., and Abney, S. (1992). Parameter estimation for constrained context-free language models. In DARPA Speech and Natural Language Workshop, pages 146–149.

    Chapter  Google Scholar 

  • McCandless, M. K. and Glass, J. (1994). Empirical acquisition of language models for speech recognition. In ICSLP, pages 835–838.

    Google Scholar 

  • Minsky, M. (1975). A framework for representing knowledge. In Winston, editor, The Psychology of Computer Vision, pages 211–281. McGraw-Hill, New Tork.

    Google Scholar 

  • Moore, R., Pereira, F., and Murveit, H. (1989). Integrating speech and natural-language processing. In DARPA Speech and Natural Language Workshop, pages 243–247

    Chapter  Google Scholar 

  • Murakami, J. and Matsunaga, S. (1994). A spontaneous speech recognition algorithm using word trigram models and filled-pause procedure. In ICSLP, pages 819–822.

    Google Scholar 

  • Murveit, H. and Moore, R. (1990). Integrating natural language constraints into HMM-based speech recognition. In ICASSP, pages 573–576.

    Google Scholar 

  • Ney, H. (1994). A word graph algorithm for large vocabulary, continuous speech recognition. In ICSLP, pages 1355–1358.

    Google Scholar 

  • Ney, H., Essen, U., and Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language, 8:1–38.

    Article  Google Scholar 

  • Niesler, T. and Woodland, P. (1995). Variable-length category-based n-grams for language modelling. Technical report, Cambridge University Engineering Department, CUED/F-INFENG/TR.215.

    Google Scholar 

  • Oerder, M. and Ney, H. (1993). Word graphs: An efficient interface between continuous-speech recognition and language understanding. In ICASSP, pages II.119–II.122.

    Google Scholar 

  • O’Shaughnessy, D. (1992). Recognition of hesitations in spontaneous speech. In ICASSP, pages 521–524.

    Google Scholar 

  • O’Shaughnessy, D. (1993). Analysis and automatic recognition of false starts in spontaneous speech. In ICASSP, pages II.724–II.727.

    Google Scholar 

  • O’Shaughnessy, D. (1994). Correcting complex false starts in spontaneous speech. In ICASSP, pages I.349–I.352.

    Google Scholar 

  • O’Shaughnessy, D. (1995). Timing patterns in fluent and disfluent spontaneous speech. In ICASSP, pages 600–603.

    Google Scholar 

  • Pallet, D., Fiscus, J., Fisher, W., and Garofolo, J. (1993). Benchmark tests for the DARPA spoken language program. In ARPA Human Language Technology Workshop, pages 7–18.

    Google Scholar 

  • Paul, D. (1989). A CSR-NL interface specification. In DARPA Speech and Natural Language Workshop, pages 203–214.

    Chapter  Google Scholar 

  • Paul, D. and Baker, J. (1992). The design for the Wall Street Journal-based CSR corpus. In DARPA Speech and Natural Language Workshop, pages 357–361.

    Chapter  Google Scholar 

  • Phillips, M., Glass, J., Polifroni, J., and Zue, V. (1992). Collection and analyses of WSJ-CSR data at MIT. In DARPA Speech and Natural Language Workshop, pages 367–372.

    Chapter  Google Scholar 

  • Pieraccini, R., Tzoukermann, E., Gorelov, Z., Gauvain, J.-L., Levin, K, Lee, C.-H., and Wilpon, J. (1992). A speech understanding system based on statistical representation of semantics. In ICASSP, pages I.193–I.196.

    Google Scholar 

  • Price, P., Fisher, W., Bernstein, J., and Pallet, D. (1988). The DARPA 1000-word Resource Management database for continuous speech recognition. In ICASSP, pages 651–654.

    Google Scholar 

  • Rosenfeld, R. and Huang, X. (1992). Improvements in stochastic language modeling. In DARPA Speech and Natural Language Workshop, pages 107–111.

    Chapter  Google Scholar 

  • Sakai, S. and Phillips, M. (1993). J-SUMMIT: Japanese spontaneous speech recognition. In EUROSPEECH, pages 2151–2154.

    Google Scholar 

  • Schultz, T. and Rogina, I. (1995). Acoustic and language modeling of human and non-human noises for human-to-human spontaneous speech recognition. In ICASSP, pages 293–296.

    Google Scholar 

  • Schwartz, R. and Austin, S. (1991). A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses. In ICASSP, pages 701–704.

    Google Scholar 

  • Schwartz, R. and Chow, Y.-L. (1989). The N-best algorithm: An efficient procedure for finding the top N sentences hypotheses. In DARPA Speech and Natural Language Workshop, pages 199–202.

    Google Scholar 

  • Seneff, S. (1992). Robust parsing for spoken language systems. In ICASSP, pages I.189–I.192.

    Google Scholar 

  • Shikano, K. (1987). Improvement of word recognition results by trigram model. In ICASSP, pages 1261–1264.

    Google Scholar 

  • Soong, F. and Huang, E. (1991). A tree-treillis based fast search for finding the N-best sentence hypotheses in continuous speech recognition. In ICASSP, pages 705–708.

    Google Scholar 

  • Stallard, D. and Bobrow, R. (1992). Fragment processing in the DELPHI system. In DARPA Speech and Natural Language Workshop, pages 305–310.

    Chapter  Google Scholar 

  • Suhm, B. and Waibel, A. (1994). Towards better language models for spontaneous speech. In ICSLP, pages 831–834.

    Google Scholar 

  • Takebayashi, Y., Tsuboi, H., Sadamoto, Y., Hashimoto, H., and Shinchi, H. (1992). A real-time speech dialogue system using spontaneous speech understanding. In ICSLP, pages 651–654.

    Google Scholar 

  • Tomita, M. (1986). An efficient word lattice parsing algorithm for continuous speech recognition. In ICASSP, pages 1569–1572.

    Google Scholar 

  • Ueberla, J. (1994). Analyzing and Improving Statistical Language Models for Speech Recognition. Ph.D. thesis. Simon Fraser University.

    Google Scholar 

  • Waegner, N. and Young, S. (1992). A trellis-based language model for speech recognition. In ICSLP, pages 245–248.

    Google Scholar 

  • Ward, N. (1994). A lightweight parser for speech understanding. In ICSLP, pages 783–786.

    Google Scholar 

  • Ward, W. (1991). Understanding spontaneous speech: The PHOENIX system. In ICASSP, pages 365–368.

    Google Scholar 

  • Ward, W. and Young, S. (1993). Flexible use of semantic constraints in speech recognition. In ICASSP, pages II.49–II.50.

    Google Scholar 

  • Wolf, J. and Woods, W. (1980). The HWIM speech understanding system. In Lea, W., editor, Trends in Speech Recognition, pages 316–393. Prentice-Hall.

    Google Scholar 

  • Woszczyna, M., Aoki-Waibel, N., F.B., Coccaro, N., Horiguchi, K., Kemp, T., Lavie, A., McNair, A., Polzin, T., Rogina, I., Rose, C, Schultz, T., Suhm, B., Tomita, M., and Waibel, A. (1994). JANUS 93: Towards spontaneous speech translation. In ICASSP, pages I.345–I.348.

    Google Scholar 

  • Young, S. (1991). Using semantics to correct parser output for ATIS utterances. In DARPA Speech and Natural Language Workshop, pages 106–111.

    Chapter  Google Scholar 

  • Zue, V., Glass, J., Goodine, D., Leung, H., Phillips, M., Polifroni, J., and Seneff, S. (1991). Integration of speech recognition and natural language processing in the MIT Voyager system. In ICASSP, pages 713–716.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Junqua, JC., Haton, JP. (1996). Spontaneous Speech. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol 341. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1297-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1297-0_11

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8555-7

  • Online ISBN: 978-1-4613-1297-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics