Advertisement

Spontaneous Speech

  • Jean-Claude Junqua
  • Jean-Paul Haton
Part of the The Kluwer International Series in Engineering and Computer Science book series (SECS, volume 341)

Summary

We discuss the problems introduced by spontaneous speech in ASR and automatic speech understanding. After describing databases representative of spontaneous speech corpora, we stress the need for natural language understanding to address the spontaneous speech problem. This is expressed by a description of possible interfaces between ASR and natural language processing together with a section on language modeling. We conclude this chapter by presenting current trends in robust parsing and interpretation of speech.

Keywords

Speech Recognition Language Model Automatic Speech Recognition Parse Tree Spontaneous Speech 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Araki, M., Kawahara, T., and Doshita, S. (1993). A keyword-driven parser for spontaneous speech understanding. In International Symposium on Spoken Dialogue, pages 113–116.Google Scholar
  2. Bahl, L., Brown, P., de Souza, P., and Mercer, R. (1989). A tree-based statistical language model for natural language speech recognition. IEEE Trans, on ASSP, ASSP-37(7):1001–1008.SCrossRefGoogle Scholar
  3. Bahl, L., Jelinek, F., and Mercer, R. (1983). A maximum likelihood approach to continuous speech recognition. IEEE Trans, on Pattern Analysis and Machine Intelligence, PAM-5(2):179–190.CrossRefGoogle Scholar
  4. Bates, M. (1993). Overview of the ARPA human language technology workshop. In ARPA Human Language Technology Workshop, pages 3–6.Google Scholar
  5. Bernstein, J. and Danielson, D. (1992). Spontaneous speech collection for the CSR corpus. In DARPA Speech and Natural Language Workshop, pages 373–378.CrossRefGoogle Scholar
  6. Bordel, G., Torres, I., and Vidal, E. (1995). QWI: A method for improved smoothing in language modelling. In ICASSP, pages 185–188.Google Scholar
  7. Buø, F., Polzin, T., and Waibel, A. (1994). Learning complex output representations in connectionist parsing of spoken language. In ICASSP, pages I.365-I.368.Google Scholar
  8. Butzberger, J., Murveit, H., Shriberg, E., and Price, P. (1992). Spontaneous speech effects in large vocabulary speech recognition applications. In DARPA Speech and Natural Language Workshop, pages 339–343.CrossRefGoogle Scholar
  9. Chomsky, N. (1957). Syntactic Structures. Mouton.Google Scholar
  10. Della Pietra, S., Della Pietra, V., Gillet, J., Lafferty, J., Printz, H., and Ures, L. (1994). Inference and estimation of a long-range trigram model. In Carrasco, R. and Oncina, J., editors. Lectures Notes in Artificial Intelligence. Springer-Verlag.Google Scholar
  11. Doddington, G. (1992). CSR corpus development. In DARPA Workshop Speech and Natural Language, pages 363–366.CrossRefGoogle Scholar
  12. Dumouchel, P., Gupta, V., Lennig, M., and Mermelstein, P. (1988). Three probabilistic language models for a large-vocabulary speech recognizer. In ICASSP, pages 513–516.Google Scholar
  13. Eckert, W. and Niemann, H. (1994). Semantic analysis in a robust spoken dialog system. In ICSLP, pages 107–110.Google Scholar
  14. Fu, K. (1976). Digital Pattern Recognition. Springer Verlag.MATHGoogle Scholar
  15. Giachin, E., Baggia, P., and Micca, G. (1994). Language models for spontaneous speech recognition: A bootstrap method for learning phrase bigrams. In ICSLP, pages 843–846.Google Scholar
  16. Gorin, A. (1994). Semantic associations, acoustic metrics and adaptive language acquisition. In ICSLP, pages 79–82.Google Scholar
  17. Hemphill, C., Godfrey, J., and Doddington, G. (1990). The ATIS spoken language systems pilot corpus. In DARPA Speech and Natural Language Workshop, pages 96–101.CrossRefGoogle Scholar
  18. Itoh, Y., Kiyama, J., and Oka, R. (1993). Spotting partial and complete sentences for spontaneous speech. In International Symposium on Spoken Dialogue, pages 109–112.Google Scholar
  19. Jackson, E. (1992). Integrating two complementary approaches to spoken language understanding. In ICSLP, pages 333–336.Google Scholar
  20. Jackson, E., Appelt, D., Bear, J., Moore, R., and Podlozny, A. (1991). A template matcher for robust NL interpretation. In DARPA Speech and Natural Language Workshop, pages 190–194.CrossRefGoogle Scholar
  21. Jain, A. N., Waibel, A., and Touretzky, D. (1992). PARSEC: A structured connectionist parsing system for spoken language. In ICASSP, pages I.205-I.208.Google Scholar
  22. Jelinek, F. (1990). Self-organized language modeling for speech recognition. In Waibel, A. and Lee, K.-F., editors, Readings in Speech Recognition, pages 450–506. Morgan Kaufmann.Google Scholar
  23. Jelinek, F., Lafferty, J., and Mercer, R. (1990). Basic methods of probabilistic context free grammars. In Speech Recognition and Understanding. Recent Advances, Trends and Applications, pages 345–360. Springer Verlag.Google Scholar
  24. Jelinek, F., Merialdo, B., Roukos, S., and Strauss, M. (1991). A dynamic language model for speech recognition. In DARPA Speech and Natural Language Workshop, pages 293–295.CrossRefGoogle Scholar
  25. Jurafsky, D., Wooters, C., Segal, J., Stolcke, A., Fosler, E., Tajchman, G., and Morgan, N. (1995). Using a stochastic context-free grammar as a language model for speech recognition. In ICASSP, pages 189–192.Google Scholar
  26. Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. ASSP, ASSP-35(3):400–401.CrossRefGoogle Scholar
  27. Kuhn, R. and De Mori, R. (1990). A cache-based natural language model for speech recognition. IEEE Trans, on Pattern Analysis and Machine Intelligence, PAMI-12(6):570–583.CrossRefGoogle Scholar
  28. Kuhn, R. and De Mori, R. (1993). Learning speech semantics with keyword classification trees. In ICASSP, pages II.55–II.58.Google Scholar
  29. Lee, K.-F. (1989). Automatic Speech Recognition: The Development of the SPHINX System. Kluwer Academic Publishers.Google Scholar
  30. Lloyd-Thomas, H., Wright, J., and Jones, G. (1995). An integrated grammar/bigram language model using path scores. In ICASSP, pages 173–176.Google Scholar
  31. Madcow (1992). Multi-site data collection for a spoken language corpus. In DARPA Speech and Natural Language Workshop, pages 7–14.Google Scholar
  32. Mari, J.-F. and Haton, J.-P. (1994). Automatic word recognition based on second-order hidden Markov models. In ICSLP, pages 247–250.Google Scholar
  33. Mark, K., Miller, M., Grenander, U., and Abney, S. (1992). Parameter estimation for constrained context-free language models. In DARPA Speech and Natural Language Workshop, pages 146–149.CrossRefGoogle Scholar
  34. McCandless, M. K. and Glass, J. (1994). Empirical acquisition of language models for speech recognition. In ICSLP, pages 835–838.Google Scholar
  35. Minsky, M. (1975). A framework for representing knowledge. In Winston, editor, The Psychology of Computer Vision, pages 211–281. McGraw-Hill, New Tork.Google Scholar
  36. Moore, R., Pereira, F., and Murveit, H. (1989). Integrating speech and natural-language processing. In DARPA Speech and Natural Language Workshop, pages 243–247CrossRefGoogle Scholar
  37. Murakami, J. and Matsunaga, S. (1994). A spontaneous speech recognition algorithm using word trigram models and filled-pause procedure. In ICSLP, pages 819–822.Google Scholar
  38. Murveit, H. and Moore, R. (1990). Integrating natural language constraints into HMM-based speech recognition. In ICASSP, pages 573–576.Google Scholar
  39. Ney, H. (1994). A word graph algorithm for large vocabulary, continuous speech recognition. In ICSLP, pages 1355–1358.Google Scholar
  40. Ney, H., Essen, U., and Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language, 8:1–38.CrossRefGoogle Scholar
  41. Niesler, T. and Woodland, P. (1995). Variable-length category-based n-grams for language modelling. Technical report, Cambridge University Engineering Department, CUED/F-INFENG/TR.215.Google Scholar
  42. Oerder, M. and Ney, H. (1993). Word graphs: An efficient interface between continuous-speech recognition and language understanding. In ICASSP, pages II.119–II.122.Google Scholar
  43. O’Shaughnessy, D. (1992). Recognition of hesitations in spontaneous speech. In ICASSP, pages 521–524.Google Scholar
  44. O’Shaughnessy, D. (1993). Analysis and automatic recognition of false starts in spontaneous speech. In ICASSP, pages II.724–II.727.Google Scholar
  45. O’Shaughnessy, D. (1994). Correcting complex false starts in spontaneous speech. In ICASSP, pages I.349–I.352.Google Scholar
  46. O’Shaughnessy, D. (1995). Timing patterns in fluent and disfluent spontaneous speech. In ICASSP, pages 600–603.Google Scholar
  47. Pallet, D., Fiscus, J., Fisher, W., and Garofolo, J. (1993). Benchmark tests for the DARPA spoken language program. In ARPA Human Language Technology Workshop, pages 7–18.Google Scholar
  48. Paul, D. (1989). A CSR-NL interface specification. In DARPA Speech and Natural Language Workshop, pages 203–214.CrossRefGoogle Scholar
  49. Paul, D. and Baker, J. (1992). The design for the Wall Street Journal-based CSR corpus. In DARPA Speech and Natural Language Workshop, pages 357–361.CrossRefGoogle Scholar
  50. Phillips, M., Glass, J., Polifroni, J., and Zue, V. (1992). Collection and analyses of WSJ-CSR data at MIT. In DARPA Speech and Natural Language Workshop, pages 367–372.CrossRefGoogle Scholar
  51. Pieraccini, R., Tzoukermann, E., Gorelov, Z., Gauvain, J.-L., Levin, K, Lee, C.-H., and Wilpon, J. (1992). A speech understanding system based on statistical representation of semantics. In ICASSP, pages I.193–I.196.Google Scholar
  52. Price, P., Fisher, W., Bernstein, J., and Pallet, D. (1988). The DARPA 1000-word Resource Management database for continuous speech recognition. In ICASSP, pages 651–654.Google Scholar
  53. Rosenfeld, R. and Huang, X. (1992). Improvements in stochastic language modeling. In DARPA Speech and Natural Language Workshop, pages 107–111.CrossRefGoogle Scholar
  54. Sakai, S. and Phillips, M. (1993). J-SUMMIT: Japanese spontaneous speech recognition. In EUROSPEECH, pages 2151–2154.Google Scholar
  55. Schultz, T. and Rogina, I. (1995). Acoustic and language modeling of human and non-human noises for human-to-human spontaneous speech recognition. In ICASSP, pages 293–296.Google Scholar
  56. Schwartz, R. and Austin, S. (1991). A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses. In ICASSP, pages 701–704.Google Scholar
  57. Schwartz, R. and Chow, Y.-L. (1989). The N-best algorithm: An efficient procedure for finding the top N sentences hypotheses. In DARPA Speech and Natural Language Workshop, pages 199–202.Google Scholar
  58. Seneff, S. (1992). Robust parsing for spoken language systems. In ICASSP, pages I.189–I.192.Google Scholar
  59. Shikano, K. (1987). Improvement of word recognition results by trigram model. In ICASSP, pages 1261–1264.Google Scholar
  60. Soong, F. and Huang, E. (1991). A tree-treillis based fast search for finding the N-best sentence hypotheses in continuous speech recognition. In ICASSP, pages 705–708.Google Scholar
  61. Stallard, D. and Bobrow, R. (1992). Fragment processing in the DELPHI system. In DARPA Speech and Natural Language Workshop, pages 305–310.CrossRefGoogle Scholar
  62. Suhm, B. and Waibel, A. (1994). Towards better language models for spontaneous speech. In ICSLP, pages 831–834.Google Scholar
  63. Takebayashi, Y., Tsuboi, H., Sadamoto, Y., Hashimoto, H., and Shinchi, H. (1992). A real-time speech dialogue system using spontaneous speech understanding. In ICSLP, pages 651–654.Google Scholar
  64. Tomita, M. (1986). An efficient word lattice parsing algorithm for continuous speech recognition. In ICASSP, pages 1569–1572.Google Scholar
  65. Ueberla, J. (1994). Analyzing and Improving Statistical Language Models for Speech Recognition. Ph.D. thesis. Simon Fraser University.Google Scholar
  66. Waegner, N. and Young, S. (1992). A trellis-based language model for speech recognition. In ICSLP, pages 245–248.Google Scholar
  67. Ward, N. (1994). A lightweight parser for speech understanding. In ICSLP, pages 783–786.Google Scholar
  68. Ward, W. (1991). Understanding spontaneous speech: The PHOENIX system. In ICASSP, pages 365–368.Google Scholar
  69. Ward, W. and Young, S. (1993). Flexible use of semantic constraints in speech recognition. In ICASSP, pages II.49–II.50.Google Scholar
  70. Wolf, J. and Woods, W. (1980). The HWIM speech understanding system. In Lea, W., editor, Trends in Speech Recognition, pages 316–393. Prentice-Hall.Google Scholar
  71. Woszczyna, M., Aoki-Waibel, N., F.B., Coccaro, N., Horiguchi, K., Kemp, T., Lavie, A., McNair, A., Polzin, T., Rogina, I., Rose, C, Schultz, T., Suhm, B., Tomita, M., and Waibel, A. (1994). JANUS 93: Towards spontaneous speech translation. In ICASSP, pages I.345–I.348.Google Scholar
  72. Young, S. (1991). Using semantics to correct parser output for ATIS utterances. In DARPA Speech and Natural Language Workshop, pages 106–111.CrossRefGoogle Scholar
  73. Zue, V., Glass, J., Goodine, D., Leung, H., Phillips, M., Polifroni, J., and Seneff, S. (1991). Integration of speech recognition and natural language processing in the MIT Voyager system. In ICASSP, pages 713–716.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Jean-Claude Junqua
    • 1
  • Jean-Paul Haton
    • 2
  1. 1.Speech Technology LaboratoryUSA
  2. 2.CRIN - INRIAFrance

Personalised recommendations