Summary
We discuss the problems introduced by spontaneous speech in ASR and automatic speech understanding. After describing databases representative of spontaneous speech corpora, we stress the need for natural language understanding to address the spontaneous speech problem. This is expressed by a description of possible interfaces between ASR and natural language processing together with a section on language modeling. We conclude this chapter by presenting current trends in robust parsing and interpretation of speech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Araki, M., Kawahara, T., and Doshita, S. (1993). A keyword-driven parser for spontaneous speech understanding. In International Symposium on Spoken Dialogue, pages 113–116.
Bahl, L., Brown, P., de Souza, P., and Mercer, R. (1989). A tree-based statistical language model for natural language speech recognition. IEEE Trans, on ASSP, ASSP-37(7):1001–1008.S
Bahl, L., Jelinek, F., and Mercer, R. (1983). A maximum likelihood approach to continuous speech recognition. IEEE Trans, on Pattern Analysis and Machine Intelligence, PAM-5(2):179–190.
Bates, M. (1993). Overview of the ARPA human language technology workshop. In ARPA Human Language Technology Workshop, pages 3–6.
Bernstein, J. and Danielson, D. (1992). Spontaneous speech collection for the CSR corpus. In DARPA Speech and Natural Language Workshop, pages 373–378.
Bordel, G., Torres, I., and Vidal, E. (1995). QWI: A method for improved smoothing in language modelling. In ICASSP, pages 185–188.
Buø, F., Polzin, T., and Waibel, A. (1994). Learning complex output representations in connectionist parsing of spoken language. In ICASSP, pages I.365-I.368.
Butzberger, J., Murveit, H., Shriberg, E., and Price, P. (1992). Spontaneous speech effects in large vocabulary speech recognition applications. In DARPA Speech and Natural Language Workshop, pages 339–343.
Chomsky, N. (1957). Syntactic Structures. Mouton.
Della Pietra, S., Della Pietra, V., Gillet, J., Lafferty, J., Printz, H., and Ures, L. (1994). Inference and estimation of a long-range trigram model. In Carrasco, R. and Oncina, J., editors. Lectures Notes in Artificial Intelligence. Springer-Verlag.
Doddington, G. (1992). CSR corpus development. In DARPA Workshop Speech and Natural Language, pages 363–366.
Dumouchel, P., Gupta, V., Lennig, M., and Mermelstein, P. (1988). Three probabilistic language models for a large-vocabulary speech recognizer. In ICASSP, pages 513–516.
Eckert, W. and Niemann, H. (1994). Semantic analysis in a robust spoken dialog system. In ICSLP, pages 107–110.
Fu, K. (1976). Digital Pattern Recognition. Springer Verlag.
Giachin, E., Baggia, P., and Micca, G. (1994). Language models for spontaneous speech recognition: A bootstrap method for learning phrase bigrams. In ICSLP, pages 843–846.
Gorin, A. (1994). Semantic associations, acoustic metrics and adaptive language acquisition. In ICSLP, pages 79–82.
Hemphill, C., Godfrey, J., and Doddington, G. (1990). The ATIS spoken language systems pilot corpus. In DARPA Speech and Natural Language Workshop, pages 96–101.
Itoh, Y., Kiyama, J., and Oka, R. (1993). Spotting partial and complete sentences for spontaneous speech. In International Symposium on Spoken Dialogue, pages 109–112.
Jackson, E. (1992). Integrating two complementary approaches to spoken language understanding. In ICSLP, pages 333–336.
Jackson, E., Appelt, D., Bear, J., Moore, R., and Podlozny, A. (1991). A template matcher for robust NL interpretation. In DARPA Speech and Natural Language Workshop, pages 190–194.
Jain, A. N., Waibel, A., and Touretzky, D. (1992). PARSEC: A structured connectionist parsing system for spoken language. In ICASSP, pages I.205-I.208.
Jelinek, F. (1990). Self-organized language modeling for speech recognition. In Waibel, A. and Lee, K.-F., editors, Readings in Speech Recognition, pages 450–506. Morgan Kaufmann.
Jelinek, F., Lafferty, J., and Mercer, R. (1990). Basic methods of probabilistic context free grammars. In Speech Recognition and Understanding. Recent Advances, Trends and Applications, pages 345–360. Springer Verlag.
Jelinek, F., Merialdo, B., Roukos, S., and Strauss, M. (1991). A dynamic language model for speech recognition. In DARPA Speech and Natural Language Workshop, pages 293–295.
Jurafsky, D., Wooters, C., Segal, J., Stolcke, A., Fosler, E., Tajchman, G., and Morgan, N. (1995). Using a stochastic context-free grammar as a language model for speech recognition. In ICASSP, pages 189–192.
Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. ASSP, ASSP-35(3):400–401.
Kuhn, R. and De Mori, R. (1990). A cache-based natural language model for speech recognition. IEEE Trans, on Pattern Analysis and Machine Intelligence, PAMI-12(6):570–583.
Kuhn, R. and De Mori, R. (1993). Learning speech semantics with keyword classification trees. In ICASSP, pages II.55–II.58.
Lee, K.-F. (1989). Automatic Speech Recognition: The Development of the SPHINX System. Kluwer Academic Publishers.
Lloyd-Thomas, H., Wright, J., and Jones, G. (1995). An integrated grammar/bigram language model using path scores. In ICASSP, pages 173–176.
Madcow (1992). Multi-site data collection for a spoken language corpus. In DARPA Speech and Natural Language Workshop, pages 7–14.
Mari, J.-F. and Haton, J.-P. (1994). Automatic word recognition based on second-order hidden Markov models. In ICSLP, pages 247–250.
Mark, K., Miller, M., Grenander, U., and Abney, S. (1992). Parameter estimation for constrained context-free language models. In DARPA Speech and Natural Language Workshop, pages 146–149.
McCandless, M. K. and Glass, J. (1994). Empirical acquisition of language models for speech recognition. In ICSLP, pages 835–838.
Minsky, M. (1975). A framework for representing knowledge. In Winston, editor, The Psychology of Computer Vision, pages 211–281. McGraw-Hill, New Tork.
Moore, R., Pereira, F., and Murveit, H. (1989). Integrating speech and natural-language processing. In DARPA Speech and Natural Language Workshop, pages 243–247
Murakami, J. and Matsunaga, S. (1994). A spontaneous speech recognition algorithm using word trigram models and filled-pause procedure. In ICSLP, pages 819–822.
Murveit, H. and Moore, R. (1990). Integrating natural language constraints into HMM-based speech recognition. In ICASSP, pages 573–576.
Ney, H. (1994). A word graph algorithm for large vocabulary, continuous speech recognition. In ICSLP, pages 1355–1358.
Ney, H., Essen, U., and Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language, 8:1–38.
Niesler, T. and Woodland, P. (1995). Variable-length category-based n-grams for language modelling. Technical report, Cambridge University Engineering Department, CUED/F-INFENG/TR.215.
Oerder, M. and Ney, H. (1993). Word graphs: An efficient interface between continuous-speech recognition and language understanding. In ICASSP, pages II.119–II.122.
O’Shaughnessy, D. (1992). Recognition of hesitations in spontaneous speech. In ICASSP, pages 521–524.
O’Shaughnessy, D. (1993). Analysis and automatic recognition of false starts in spontaneous speech. In ICASSP, pages II.724–II.727.
O’Shaughnessy, D. (1994). Correcting complex false starts in spontaneous speech. In ICASSP, pages I.349–I.352.
O’Shaughnessy, D. (1995). Timing patterns in fluent and disfluent spontaneous speech. In ICASSP, pages 600–603.
Pallet, D., Fiscus, J., Fisher, W., and Garofolo, J. (1993). Benchmark tests for the DARPA spoken language program. In ARPA Human Language Technology Workshop, pages 7–18.
Paul, D. (1989). A CSR-NL interface specification. In DARPA Speech and Natural Language Workshop, pages 203–214.
Paul, D. and Baker, J. (1992). The design for the Wall Street Journal-based CSR corpus. In DARPA Speech and Natural Language Workshop, pages 357–361.
Phillips, M., Glass, J., Polifroni, J., and Zue, V. (1992). Collection and analyses of WSJ-CSR data at MIT. In DARPA Speech and Natural Language Workshop, pages 367–372.
Pieraccini, R., Tzoukermann, E., Gorelov, Z., Gauvain, J.-L., Levin, K, Lee, C.-H., and Wilpon, J. (1992). A speech understanding system based on statistical representation of semantics. In ICASSP, pages I.193–I.196.
Price, P., Fisher, W., Bernstein, J., and Pallet, D. (1988). The DARPA 1000-word Resource Management database for continuous speech recognition. In ICASSP, pages 651–654.
Rosenfeld, R. and Huang, X. (1992). Improvements in stochastic language modeling. In DARPA Speech and Natural Language Workshop, pages 107–111.
Sakai, S. and Phillips, M. (1993). J-SUMMIT: Japanese spontaneous speech recognition. In EUROSPEECH, pages 2151–2154.
Schultz, T. and Rogina, I. (1995). Acoustic and language modeling of human and non-human noises for human-to-human spontaneous speech recognition. In ICASSP, pages 293–296.
Schwartz, R. and Austin, S. (1991). A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses. In ICASSP, pages 701–704.
Schwartz, R. and Chow, Y.-L. (1989). The N-best algorithm: An efficient procedure for finding the top N sentences hypotheses. In DARPA Speech and Natural Language Workshop, pages 199–202.
Seneff, S. (1992). Robust parsing for spoken language systems. In ICASSP, pages I.189–I.192.
Shikano, K. (1987). Improvement of word recognition results by trigram model. In ICASSP, pages 1261–1264.
Soong, F. and Huang, E. (1991). A tree-treillis based fast search for finding the N-best sentence hypotheses in continuous speech recognition. In ICASSP, pages 705–708.
Stallard, D. and Bobrow, R. (1992). Fragment processing in the DELPHI system. In DARPA Speech and Natural Language Workshop, pages 305–310.
Suhm, B. and Waibel, A. (1994). Towards better language models for spontaneous speech. In ICSLP, pages 831–834.
Takebayashi, Y., Tsuboi, H., Sadamoto, Y., Hashimoto, H., and Shinchi, H. (1992). A real-time speech dialogue system using spontaneous speech understanding. In ICSLP, pages 651–654.
Tomita, M. (1986). An efficient word lattice parsing algorithm for continuous speech recognition. In ICASSP, pages 1569–1572.
Ueberla, J. (1994). Analyzing and Improving Statistical Language Models for Speech Recognition. Ph.D. thesis. Simon Fraser University.
Waegner, N. and Young, S. (1992). A trellis-based language model for speech recognition. In ICSLP, pages 245–248.
Ward, N. (1994). A lightweight parser for speech understanding. In ICSLP, pages 783–786.
Ward, W. (1991). Understanding spontaneous speech: The PHOENIX system. In ICASSP, pages 365–368.
Ward, W. and Young, S. (1993). Flexible use of semantic constraints in speech recognition. In ICASSP, pages II.49–II.50.
Wolf, J. and Woods, W. (1980). The HWIM speech understanding system. In Lea, W., editor, Trends in Speech Recognition, pages 316–393. Prentice-Hall.
Woszczyna, M., Aoki-Waibel, N., F.B., Coccaro, N., Horiguchi, K., Kemp, T., Lavie, A., McNair, A., Polzin, T., Rogina, I., Rose, C, Schultz, T., Suhm, B., Tomita, M., and Waibel, A. (1994). JANUS 93: Towards spontaneous speech translation. In ICASSP, pages I.345–I.348.
Young, S. (1991). Using semantics to correct parser output for ATIS utterances. In DARPA Speech and Natural Language Workshop, pages 106–111.
Zue, V., Glass, J., Goodine, D., Leung, H., Phillips, M., Polifroni, J., and Seneff, S. (1991). Integration of speech recognition and natural language processing in the MIT Voyager system. In ICASSP, pages 713–716.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1996 Kluwer Academic Publishers
About this chapter
Cite this chapter
Junqua, JC., Haton, JP. (1996). Spontaneous Speech. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol 341. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1297-0_11
Download citation
DOI: https://doi.org/10.1007/978-1-4613-1297-0_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8555-7
Online ISBN: 978-1-4613-1297-0
eBook Packages: Springer Book Archive