Summary
After considering important properties required by speech recognition applications, this chapter concentrates on the use of application domain knowledge, the user interface, and the use of dialogue to improve the robustness of an application using speech. At the dialogue level, the emphasis is put on multimodality, error correction, naturalness, and the need for different dialogue strategies depending on the application considered. To build application prototypes and to reduce development costs, it is important to develop a flexible technology which can be adapted to various applications belonging to the same class. As important steps towards this goal, vocabulary-independent recognition, application-independent dialogue architectures, and the notion of global speech interface are considered. When an application prototype exists, it needs to be evaluated. This leads us to a description of various assessment methodologies recently developed. We conclude this chapter by the presentation of a robust real-world application, where we stress the factors contributing to its robustness. Finally, we briefly provide some perspectives on the most promising application domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ainsworth, W. and Pratt, S. (1993). Comparing error correction strategies in speech recognition systems. In Baber, C. and Noyes, J., editors, Interactive Speech Technology, pages 131–135. Taylor&Francis.
Anglade, Y. (1994). Robustesse de la Reconnaissance Automatique de la Parole: Etude et Application dans un Système d’Aide Vocal pour une Standardiste Mal-Voyante. Ph.D. thesis. Université Henri Poincaré, Nancy I.
Anglade, Y., Fohr, D., and Junqua, J.-C. (1992). A robust discrimination method based on selectively trained neural networks. In ETRW: Speech Processing in Adverse Conditions, pages 175–178.
Anglade, Y., Fohr, D., and Junqua, J.-C. (1993a). Speech discrimination in adverse conditions using acoustic knowledge and selectively trained neural networks. In ICASSP, pages II.279–II.282.
Anglade, Y., Pierrel, J.-M., and Junqua, J.-C. (1991). A spoken language interface for a telephone switchboard operator center. In EUROSPEECH, pages 307–310.
Anglade, Y., Pierrel, J.-M., and Junqua, J.-C. (1993b). TOBIE-SOL: A conversational system for a telephone switchboard operator center. JAVIOS, 14:23–39.
Atal, B. (1994). Speech technology in 2001: New research directions. In Roe, D. and Wilpon, J., editors, Voice Communication Between Humans And Machines, pages 467–481. National Academy Press.
Baber, C. (1991). Speech Technology in Control Room. Ellis Horwood.
Bahl, L., Bakis, R., Cohen, P., Cole, A., Jelinek, F., Lewis, B., and Mercer, R. (1980). Further results on the recognition of a continuously read natural corpus. In ICASSP, pages 872–875.
Bamberg, P. and Gillick, L. (1990). Phoneme-in-context modeling for Dragon’s continuous speech recognizer. In DARPA Workshop on Speech Recognition, pages 163–169.
Bardaud, P., Capman, F., Mokbel, C, Tadj, C, and Chollet, G. (1992). Transformation of databases for the evaluation of speech recognizers. In ICSLP, pages 1431–1434.
Bates, M. and Ayuso, D. (1991). A proposal for incremental dialogue evaluation. In DARPA Workshop Speech and Natural Language, pages 319–322.
Bates, M., Boisen, S., and Makhoul, J. (1990). Developing and evaluation methodology for spoken language systems. In DARPA Workshop on Speech Recognition, pages 102–108.
Bourjot, C., Boyer, A., and Fohr, D. (1991). A tool for assessment of acoustic phonetic lattices. In EUROSPEECH, pages 521–528.
Chollet, G. and Gagnoulet, C. (1982). On the evaluation of speech recognizers and data bases using a reference system. In ICASSP, pages 2026–2029.
Chow, Y., Schwartz, R., Roucos, S., Kimball, O., Price, P., Kubala, F., Dunham, M., Krasner, M., and Makhoul, J. (1986). The role of word-dependent coarticulato-ry effects in a phoneme-based speech recognition system. In ICASSP, pages 1593–1596.
Cole, R., Hirschman, L., Atlas, L., Beckman, M., Biermann, A., Bush, M., Clements, M., Cohen, J., Garcia, O., Hanson, B., Hermansky, H., Levinson, S., McKeown, K., Morgan, N., Novick, D., Ostendorf, M., Oviatt, S., Price, P., Silverman, H., Spitz, J., Waibel, A., Weinstein, C, Zahorian, S., and Zue, V. (1995). The challenge of spoken language systems: Research directions for die nineties. IEEE Trans, on Speech and Audio Processing, 3(1):1–21.
Cosky, M., Lively, B., Roberts, L., and Wattenbarger, B. (1995). Talking to machines today and tomorrow: Designing for the user. AT&T Technical Journal, pages 81–91.
Coutaz, J. and Caelen, J. (1991). A taxonomy for multimedia and multimodal interfaces. In ERCIM Workshop, INESC Lisbon.
Damper, R. (1993). Speech as an interface medium: How can it best be used? In Baber, C. and Noyes, J., editors. Interactive Speech Technology, pages 59–71. Taylor&Francis.
Danielsen, S. (1990). Standardisation of speech input assessment within the SAM ESPRIT project. In ICSLP, pages 1021–1024.
Danielsen, S. (1993). Enhanced direct assessment of speech input systems within the SAM-A ESPRIT project. In EUROSPEECH, pages 207–210.
Delogu, C., Di Carlo, A., Sementina, C., and Stecconi, S. (1993). A methodology for evaluating human-machine spoken language interaction. In EUROSPEECH, pages 1427–1430.
D’Orta, P., Ferretti, M., and Scarci, S. (1987). Phoneme classification for real time speech recognition of Italian, In ICASSP, pages 81–84.
ESCA-NATO/RSG-10 (1993). Proceedings of Applications of Speech Technology, Joint ESCA-NATO/RSG-10 Workshop, Lautrach, Germany. September.
ETRW (1991). Proceedings of the Second Venaco Workshop: The Structure of Multimodal Dialogue, Maratea, Italy. September.
ETRW (1992). Proceedings of the Workshop: Speech Processing in Adverse Conditions. November.
ETRW (1995). Proceedings of the Workshop on Spoken Dialogue Systems, Vigsø, Denmark. June.
Fellbaum, K., Heinstein, R., and Loebner, H. (1989). Speech dialogue systems — state of the art and selected applications. In EUROSPEECH, pages 433–436.
Frankish, C. and Noyes, J. (1993). Feedback in automatic speech recognition: Who is saying what and to whom? In Baber, C. and Noyes, J., editors, Interactive Speech Technology, pages 121–130. Taylor&Francis.
Fraser, N. and Gilbert, G. (1991). Simulating speech systems. Computer Speech and Language, 5:81–99.
Furui, S. (1994). Toward the ultimate synthesis/recognition system. In Roe, D. and Wilpon, J., editors, Voice Communication Between Humans And Machines, pages 450–466. National Academy Press.
Gaiffe, B., Romary, L., and Pierrel, J.-M. (1991). Refering in a multimodal environment: From NL to designation. In Second Venaco Workshop: The Structure of Multimodal Dialogue.
Gavignet, F., Guyomard, M., and Siroux, J. (1991). Implementing an oral and graphic multimodal application: The GEORAL project. In Second Venaco Workshop: The Structure of Multimodal Dialogue.
Gerbino, E., Baggia, P., Ciaramella, A., and Rullent, C. (1993). Test and evaluation of a spoken dialogue system. In ICASSP, pages II.135–II.138.
Hapeshi, K. (1993). Design guidelines for using speech in interactive multimedia systems. In Baber, C. and Noyes, J., editors, Interactive Speech Technology, pages 177–188. Taylor&Francis.
Hapeshi, K. and Jones, D. (1989). The ergonomics of automatic speech recognition interfaces. In Oborne, D., editor, International Review of Ergonomics, volume 2, pages 251–290. Taylor&Francis.
Haton, J.-P., Pierrel, J.-M., Caelen, J., Perennou, G., and Gauvain, J.-L. (1991). Reconnaissance Automatique de la Parole. Dunod.
Hieronymus, J. and Majurski, W. (1985). A reference speech recognition algorithm for benchmarking and speech data base analysis. In ICASSP, pages 1573–1576.
Hon, H. (1992). Vocabulary Independent Speech Recognition: The VOCIND System. Ph.D. thesis. Carnegie Mellon University.
Hunt, M. (1990). Figures of merit for assessing connected-word recognisers. Speech Communication, Special Issue on Speech Input/Output Assessment and Speech Databases, 9(4):329–336.
Hunt, M., Lennig, M., and Mermelstein, P. (1980). Experiments in syllable-based recognition of continuous speech. In ICASSP, pages 880–883.
IEEE (1991). Proceedings of the IEEE Workshop on Automatic Speech Recognition, Arden House, Harriman, NY. December.
ISSD (1993). Proceedings of the International Symposium on Spoken Dialogue, Waseda University, Tokyo, Japan. November.
Junqua, J.-C. and Morin, P. (1993). Towards successful and usable applications using speech technology. In ESCA-NATOIRSG10 Workshop, Applications of Speech Technology.
Junqua, J.-C. and Morin, P. (1994). Naturalness of the interaction in multimodal applications. In ICSLP, pages 563–566.
Kao, Y., Hemphill, C., Wheatley, B., and Rajasekaran, P. (1994). Toward vocabulary independent telephone speech recognition. In ICASSP, pages I.117–1.120.
Kompe, R., Kiessling, A., Kuhn, T., Mast, M., Niemann, H., Nöth, E., Ott, K., and Batliner, A. (1993). Prosody takes over: A prosodically guided dialog system. In EUROSPEECH, pages 2003–2006.
Larsen, L. and Baekgaard, A. (1994). Rapid prototyping of a dialogue system using a generic dialogue development platform. In ICSLP, pages 919–922.
Lee, K. (1988). Large-Vocabulary Speaker-Independent Continuous Speech Recognition: The SPHINX System. PhD. thesis. Carnegie Mellon University.
Lee, K.-F. (1990). Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition. IEEE Trans. ASSP, ASSP-38(4):599–609.
Lee, K.-F., Hayamizu, S., Hon, H.-W., Huang, C., Swartz, J., and Weide, R. (1990). Allophone clustering for continuous speech recognition. In ICASSP, pages 749–752.
Lee, K.-F. and Mahajan, S. (1990). Corrective and reinforcement learning for speaker-independent continuous speech recognition. Computer Speech and Language, 4:231–245.
Levinson, S. and Fallside, F. (1994). Speech technology in the year 2001. In Roe, D. and Wilpon, J., editors, Voice Communication Between Humans And Machines, pages 445–449. National Academy Press.
Lindberg, B. and Danielsen, S. (1989). Specification of the low level SESAM. Technical report, ESPRIT Project 1541, Extension Phase, Final Report.
Lunati, J. and Rudnicky, A. (1990). The design of a spoken language interface. In DARPA Workshop on Spoken Language Systems, pages 225–229.
Mathan, L. and Morin, D. (1991). Speech field databases: Development and analysis. In EUROSPEECH, pages 509–511.
Matrouf, K. and Néel, F. (1991). Use of upper level knowledge to improve human-machine interaction. In Second Venaco Workshop: The Structure of Multimodal Dialogue.
McCauley, M. (1984). Human factors in voice technology. In Muckler, F., editor, Human Factors Review, pages 131–166. Human Factors Society.
Moore, R. (1977). Evaluating speech recognizers. IEEE Trans. ASSP, ASSP-25.178–183.
Morin, P., Junqua, J.-C., and Pierrel, J.-M. (1992). A flexible multimodal dialogue architecture independent of the application. In ICSLP, pages 939–942.
Nielsen, P. and Baekgaard, A. (1992). Experience with a dialogue description formalism for realistic applications. In ICSLP, pages 719–722.
Nusbaum, H. and Pisoni, D. (1987). Automatic measurement of speech recognition performance: A comparison of six speaker-dependent recognition devices. Computer Speech and Language, 2:87–108.
Paul, D. and Martin, E. (1988). Speaker stress-resistant continuous speech recognition. In ICASSP, pages 283–286.
Peckham, J. (1991). Speech understanding and dialogue over the telephone: An overview of progress in the SUNDIAL project. In EUROSPEECH, pages 1469–1472.
Peckham, J., Thomas, T., and Frangoulis, E. (1990). Recogniser sensitivity analysis: A method for assessing the performance of speech recognizers. Speech Communication, Special Issue on Speech Input/Output Assessment and Speech Databases, 9(4):317–327.
Pierrel, J.-M. (1989). Rapport de faisabilité d’un poste de standardiste pour mal-voyant utilisant un dialogue homme-machine multi-media incluant une forte composante orale. Technical report, CRIN-SOLLAC.
Polifroni, J., Hirschman, L., Seneff, S., and Zue, V. (1992). Experiments in evaluating interactive spoken language systems. In DARPA Workshop Speech and Natural Language, pages 28–33.
Price, P., Hirschman, L., Shriberg, E., and Wade, E. (1992). Subject-based evaluation measures for interactive spoken language systems. In DARPA Workshop Speech and Natural Language, pages 34–39.
Riccio, A., Ceglie, F., and Brancaccio, A. (1993). Reliable assessment of speech recognisers for telephone environment. In EUROSPEECH, pages 1885–1888.
Roe, D. and Wilpon, J. (1993). Whither speech recognition: The next 25 years. IEEE Communications Magazine, pages 54–62.
Rose, R. (1993). Definition of subword acoustic units for wordspotting. In EURO-SPEECH, pages 1049–1052.
Rudnicky, A. (1989). The design of voice-driven interfaces. In DARPA Workshop on Spoken Language Systems, pages 120–124.
Schank, R. and Abelson, R. (1977). Scripts, Plans, Goals and Understanding. Lawrence Erlbaum.
Schwartz, R., Chow, Y., Roucos, S„ Krasner, M., and Makhoul, J. (1984). Improved hidden Markov modeling phonemes for continuous speech recognition. In ICASSP, pages 35.6.1–35.6.4.
Seto, S., Kanazawa, H., Shinchi, H., and Takebayashi, Y. (1994). Spontaneous speech dialogue system TOSBURG II and its evaluation. Speech Communication, 15(3–4):341–353.
Shirai, K. and Furui, S. (1994). Special issue on spoken dialogue: K. Shirai and S. Fund, editors. Speech Communication, 15(3–4).
Simpson, A. and Fraser, N. (1993). Black box and glass box evaluation of the SUNDIAL system. In EUROSPEECH, pages 1423–1426.
Siroux, J., Kharoune, M., and Guyomard, M. (1994). Application and dialogue in the SUNDIAL system. In ICSLP, pages 927–930.
Sloboda, T. (1995). Dictionary learning: Performance through consistency. In ICASSP, pages 453–456.
Steeneken, H. and Van Velden, J. (1991). RAMOS—Recognizer Assessment by means of Manipulation Of Speech applied to connected speech recognition. In EURO-SPEECH, pages 529–532.
Steeneken, H. and Varga, A. (1993). Assessment for automatic speech recognition: I. Comparison of assessment methods. Speech Communication, 12(3):241–246.
Strong, R. (1993). CASPER: A speech interface for the Macintosh. In EUROSPEECH, pages 2073–2076.
Tate, M., Webster, R., and Weeks, R. (1993). Evaluation and prototyping of dialogues for voice applications. In Baber, C. and Noyes, J., editors, Interactive Speech Technology, pages 157–165. Taylor&Francis.
Taylor, M., Néel, F., and D. G. Bouwhuis (1989). The Structure of Multimodal Dialogue: M.M. Taylor, F. Néel, and D.G. Bouwhuis, editors. Elsevier Science Publishers.
Van de Vegte, J. and Taylor, M. (1990). Testing the effective vocabulary capacity method of evaluating speech recognizers. Speech Communication, Special Issue on Speech Input/Output Assessment and Speech Databases, 9(4):337–347.
Vaiga, A. and Steeneken, H. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3):247–251.
Wayrd, P. (1993). The comparative assessment of commercial speech recognizers. In EUROSPEECH, pages 1881–1884.
Wilson, M., Sedlock, D., Binot, J., and Falzon, P. (1991). An architecture for multimodal dialogue. In Second Venaco Workshop: The Structure of Multimodal Dialogue.
Winski, R. and Kordi, K. (1991). Assessment of continuous speech recognizers using recognizer sensitivity analysis. In EUROSPEECH, pages 521–524.
Wong, M. (1994). Clustering triphones by phonological mapping. In ICSLP, pages 1939–1942.
Wood, L., Pearce, D., and Novello, F. (1991). Improved vocabulary-independent sub-word HMM modelling. In ICASSP, pages 181–184.
Yankelovich, N. and Baatz, E. (1994). A framework for building speech applications. In AVIOS, pages 179–188.
Young, S. (1989). The MINDS system: Using context and dialog to enhance speech recognition. In DARPA Speech and Natural Language, pages 131–136.
Young, S. and Proctor, C. (1989). The design and implementation of dialogue control in voice operated database inquiry systems. Computer Speech and Language, 3:329–353.
Zue, V., Glass, J., Goodine, D., Leung, H., Philipps, M., Polifroni, J., and Seneff, S. (1990). The VOYAGER speech understanding system: Preliminary development and evaluation. In ICASSP, pages 73–76.
Zue, V., Seneff, S., Polifroni, J., Philipps, M., Pao, C, Goodine, D., Goddeau, D., and Glass, J. (1994). PEGASUS: A spoken dialogue interface for on-line air travel planning. Speech Communication, 15(3–4):331–340.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1996 Kluwer Academic Publishers
About this chapter
Cite this chapter
Junqua, JC., Haton, JP. (1996). Application Domain, Human Factors, and Dialogue. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol 341. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1297-0_13
Download citation
DOI: https://doi.org/10.1007/978-1-4613-1297-0_13
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8555-7
Online ISBN: 978-1-4613-1297-0
eBook Packages: Springer Book Archive