Application Domain, Human Factors, and Dialogue

Junqua, Jean-Claude; Haton, Jean-Paul

doi:10.1007/978-1-4613-1297-0_13

Jean-Claude Junqua³ &
Jean-Paul Haton⁴

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 341))

Summary

After considering important properties required by speech recognition applications, this chapter concentrates on the use of application domain knowledge, the user interface, and the use of dialogue to improve the robustness of an application using speech. At the dialogue level, the emphasis is put on multimodality, error correction, naturalness, and the need for different dialogue strategies depending on the application considered. To build application prototypes and to reduce development costs, it is important to develop a flexible technology which can be adapted to various applications belonging to the same class. As important steps towards this goal, vocabulary-independent recognition, application-independent dialogue architectures, and the notion of global speech interface are considered. When an application prototype exists, it needs to be evaluated. This leads us to a description of various assessment methodologies recently developed. We conclude this chapter by the presentation of a robust real-world application, where we stress the factors contributing to its robustness. Finally, we briefly provide some perspectives on the most promising application domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ainsworth, W. and Pratt, S. (1993). Comparing error correction strategies in speech recognition systems. In Baber, C. and Noyes, J., editors, Interactive Speech Technology, pages 131–135. Taylor&Francis.
Google Scholar
Anglade, Y. (1994). Robustesse de la Reconnaissance Automatique de la Parole: Etude et Application dans un Système d’Aide Vocal pour une Standardiste Mal-Voyante. Ph.D. thesis. Université Henri Poincaré, Nancy I.
Google Scholar
Anglade, Y., Fohr, D., and Junqua, J.-C. (1992). A robust discrimination method based on selectively trained neural networks. In ETRW: Speech Processing in Adverse Conditions, pages 175–178.
Google Scholar
Anglade, Y., Fohr, D., and Junqua, J.-C. (1993a). Speech discrimination in adverse conditions using acoustic knowledge and selectively trained neural networks. In ICASSP, pages II.279–II.282.
Google Scholar
Anglade, Y., Pierrel, J.-M., and Junqua, J.-C. (1991). A spoken language interface for a telephone switchboard operator center. In EUROSPEECH, pages 307–310.
Google Scholar
Anglade, Y., Pierrel, J.-M., and Junqua, J.-C. (1993b). TOBIE-SOL: A conversational system for a telephone switchboard operator center. JAVIOS, 14:23–39.
Google Scholar
Atal, B. (1994). Speech technology in 2001: New research directions. In Roe, D. and Wilpon, J., editors, Voice Communication Between Humans And Machines, pages 467–481. National Academy Press.
Google Scholar
Baber, C. (1991). Speech Technology in Control Room. Ellis Horwood.
Google Scholar
Bahl, L., Bakis, R., Cohen, P., Cole, A., Jelinek, F., Lewis, B., and Mercer, R. (1980). Further results on the recognition of a continuously read natural corpus. In ICASSP, pages 872–875.
Google Scholar
Bamberg, P. and Gillick, L. (1990). Phoneme-in-context modeling for Dragon’s continuous speech recognizer. In DARPA Workshop on Speech Recognition, pages 163–169.
Google Scholar
Bardaud, P., Capman, F., Mokbel, C, Tadj, C, and Chollet, G. (1992). Transformation of databases for the evaluation of speech recognizers. In ICSLP, pages 1431–1434.
Google Scholar
Bates, M. and Ayuso, D. (1991). A proposal for incremental dialogue evaluation. In DARPA Workshop Speech and Natural Language, pages 319–322.
Chapter Google Scholar
Bates, M., Boisen, S., and Makhoul, J. (1990). Developing and evaluation methodology for spoken language systems. In DARPA Workshop on Speech Recognition, pages 102–108.
Google Scholar
Bourjot, C., Boyer, A., and Fohr, D. (1991). A tool for assessment of acoustic phonetic lattices. In EUROSPEECH, pages 521–528.
Google Scholar
Chollet, G. and Gagnoulet, C. (1982). On the evaluation of speech recognizers and data bases using a reference system. In ICASSP, pages 2026–2029.
Google Scholar
Chow, Y., Schwartz, R., Roucos, S., Kimball, O., Price, P., Kubala, F., Dunham, M., Krasner, M., and Makhoul, J. (1986). The role of word-dependent coarticulato-ry effects in a phoneme-based speech recognition system. In ICASSP, pages 1593–1596.
Google Scholar
Cole, R., Hirschman, L., Atlas, L., Beckman, M., Biermann, A., Bush, M., Clements, M., Cohen, J., Garcia, O., Hanson, B., Hermansky, H., Levinson, S., McKeown, K., Morgan, N., Novick, D., Ostendorf, M., Oviatt, S., Price, P., Silverman, H., Spitz, J., Waibel, A., Weinstein, C, Zahorian, S., and Zue, V. (1995). The challenge of spoken language systems: Research directions for die nineties. IEEE Trans, on Speech and Audio Processing, 3(1):1–21.
Article Google Scholar
Cosky, M., Lively, B., Roberts, L., and Wattenbarger, B. (1995). Talking to machines today and tomorrow: Designing for the user. AT&T Technical Journal, pages 81–91.
Google Scholar
Coutaz, J. and Caelen, J. (1991). A taxonomy for multimedia and multimodal interfaces. In ERCIM Workshop, INESC Lisbon.
Google Scholar
Damper, R. (1993). Speech as an interface medium: How can it best be used? In Baber, C. and Noyes, J., editors. Interactive Speech Technology, pages 59–71. Taylor&Francis.
Google Scholar
Danielsen, S. (1990). Standardisation of speech input assessment within the SAM ESPRIT project. In ICSLP, pages 1021–1024.
Google Scholar
Danielsen, S. (1993). Enhanced direct assessment of speech input systems within the SAM-A ESPRIT project. In EUROSPEECH, pages 207–210.
Google Scholar
Delogu, C., Di Carlo, A., Sementina, C., and Stecconi, S. (1993). A methodology for evaluating human-machine spoken language interaction. In EUROSPEECH, pages 1427–1430.
Google Scholar
D’Orta, P., Ferretti, M., and Scarci, S. (1987). Phoneme classification for real time speech recognition of Italian, In ICASSP, pages 81–84.
Google Scholar
ESCA-NATO/RSG-10 (1993). Proceedings of Applications of Speech Technology, Joint ESCA-NATO/RSG-10 Workshop, Lautrach, Germany. September.
Google Scholar
ETRW (1991). Proceedings of the Second Venaco Workshop: The Structure of Multimodal Dialogue, Maratea, Italy. September.
Google Scholar
ETRW (1992). Proceedings of the Workshop: Speech Processing in Adverse Conditions. November.
Google Scholar
ETRW (1995). Proceedings of the Workshop on Spoken Dialogue Systems, Vigsø, Denmark. June.
Google Scholar
Fellbaum, K., Heinstein, R., and Loebner, H. (1989). Speech dialogue systems — state of the art and selected applications. In EUROSPEECH, pages 433–436.
Google Scholar
Frankish, C. and Noyes, J. (1993). Feedback in automatic speech recognition: Who is saying what and to whom? In Baber, C. and Noyes, J., editors, Interactive Speech Technology, pages 121–130. Taylor&Francis.
Google Scholar
Fraser, N. and Gilbert, G. (1991). Simulating speech systems. Computer Speech and Language, 5:81–99.
Article Google Scholar
Furui, S. (1994). Toward the ultimate synthesis/recognition system. In Roe, D. and Wilpon, J., editors, Voice Communication Between Humans And Machines, pages 450–466. National Academy Press.
Google Scholar
Gaiffe, B., Romary, L., and Pierrel, J.-M. (1991). Refering in a multimodal environment: From NL to designation. In Second Venaco Workshop: The Structure of Multimodal Dialogue.
Google Scholar
Gavignet, F., Guyomard, M., and Siroux, J. (1991). Implementing an oral and graphic multimodal application: The GEORAL project. In Second Venaco Workshop: The Structure of Multimodal Dialogue.
Google Scholar
Gerbino, E., Baggia, P., Ciaramella, A., and Rullent, C. (1993). Test and evaluation of a spoken dialogue system. In ICASSP, pages II.135–II.138.
Google Scholar
Hapeshi, K. (1993). Design guidelines for using speech in interactive multimedia systems. In Baber, C. and Noyes, J., editors, Interactive Speech Technology, pages 177–188. Taylor&Francis.
Google Scholar
Hapeshi, K. and Jones, D. (1989). The ergonomics of automatic speech recognition interfaces. In Oborne, D., editor, International Review of Ergonomics, volume 2, pages 251–290. Taylor&Francis.
Google Scholar
Haton, J.-P., Pierrel, J.-M., Caelen, J., Perennou, G., and Gauvain, J.-L. (1991). Reconnaissance Automatique de la Parole. Dunod.
Google Scholar
Hieronymus, J. and Majurski, W. (1985). A reference speech recognition algorithm for benchmarking and speech data base analysis. In ICASSP, pages 1573–1576.
Google Scholar
Hon, H. (1992). Vocabulary Independent Speech Recognition: The VOCIND System. Ph.D. thesis. Carnegie Mellon University.
Google Scholar
Hunt, M. (1990). Figures of merit for assessing connected-word recognisers. Speech Communication, Special Issue on Speech Input/Output Assessment and Speech Databases, 9(4):329–336.
Google Scholar
Hunt, M., Lennig, M., and Mermelstein, P. (1980). Experiments in syllable-based recognition of continuous speech. In ICASSP, pages 880–883.
Google Scholar
IEEE (1991). Proceedings of the IEEE Workshop on Automatic Speech Recognition, Arden House, Harriman, NY. December.
Google Scholar
ISSD (1993). Proceedings of the International Symposium on Spoken Dialogue, Waseda University, Tokyo, Japan. November.
Google Scholar
Junqua, J.-C. and Morin, P. (1993). Towards successful and usable applications using speech technology. In ESCA-NATOIRSG10 Workshop, Applications of Speech Technology.
Google Scholar
Junqua, J.-C. and Morin, P. (1994). Naturalness of the interaction in multimodal applications. In ICSLP, pages 563–566.
Google Scholar
Kao, Y., Hemphill, C., Wheatley, B., and Rajasekaran, P. (1994). Toward vocabulary independent telephone speech recognition. In ICASSP, pages I.117–1.120.
Google Scholar
Kompe, R., Kiessling, A., Kuhn, T., Mast, M., Niemann, H., Nöth, E., Ott, K., and Batliner, A. (1993). Prosody takes over: A prosodically guided dialog system. In EUROSPEECH, pages 2003–2006.
Google Scholar
Larsen, L. and Baekgaard, A. (1994). Rapid prototyping of a dialogue system using a generic dialogue development platform. In ICSLP, pages 919–922.
Google Scholar
Lee, K. (1988). Large-Vocabulary Speaker-Independent Continuous Speech Recognition: The SPHINX System. PhD. thesis. Carnegie Mellon University.
Google Scholar
Lee, K.-F. (1990). Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition. IEEE Trans. ASSP, ASSP-38(4):599–609.
Article Google Scholar
Lee, K.-F., Hayamizu, S., Hon, H.-W., Huang, C., Swartz, J., and Weide, R. (1990). Allophone clustering for continuous speech recognition. In ICASSP, pages 749–752.
Google Scholar
Lee, K.-F. and Mahajan, S. (1990). Corrective and reinforcement learning for speaker-independent continuous speech recognition. Computer Speech and Language, 4:231–245.
Article Google Scholar
Levinson, S. and Fallside, F. (1994). Speech technology in the year 2001. In Roe, D. and Wilpon, J., editors, Voice Communication Between Humans And Machines, pages 445–449. National Academy Press.
Google Scholar
Lindberg, B. and Danielsen, S. (1989). Specification of the low level SESAM. Technical report, ESPRIT Project 1541, Extension Phase, Final Report.
Google Scholar
Lunati, J. and Rudnicky, A. (1990). The design of a spoken language interface. In DARPA Workshop on Spoken Language Systems, pages 225–229.
Google Scholar
Mathan, L. and Morin, D. (1991). Speech field databases: Development and analysis. In EUROSPEECH, pages 509–511.
Google Scholar
Matrouf, K. and Néel, F. (1991). Use of upper level knowledge to improve human-machine interaction. In Second Venaco Workshop: The Structure of Multimodal Dialogue.
Google Scholar
McCauley, M. (1984). Human factors in voice technology. In Muckler, F., editor, Human Factors Review, pages 131–166. Human Factors Society.
Google Scholar
Moore, R. (1977). Evaluating speech recognizers. IEEE Trans. ASSP, ASSP-25.178–183.
Google Scholar
Morin, P., Junqua, J.-C., and Pierrel, J.-M. (1992). A flexible multimodal dialogue architecture independent of the application. In ICSLP, pages 939–942.
Google Scholar
Nielsen, P. and Baekgaard, A. (1992). Experience with a dialogue description formalism for realistic applications. In ICSLP, pages 719–722.
Google Scholar
Nusbaum, H. and Pisoni, D. (1987). Automatic measurement of speech recognition performance: A comparison of six speaker-dependent recognition devices. Computer Speech and Language, 2:87–108.
Article Google Scholar
Paul, D. and Martin, E. (1988). Speaker stress-resistant continuous speech recognition. In ICASSP, pages 283–286.
Google Scholar
Peckham, J. (1991). Speech understanding and dialogue over the telephone: An overview of progress in the SUNDIAL project. In EUROSPEECH, pages 1469–1472.
Google Scholar
Peckham, J., Thomas, T., and Frangoulis, E. (1990). Recogniser sensitivity analysis: A method for assessing the performance of speech recognizers. Speech Communication, Special Issue on Speech Input/Output Assessment and Speech Databases, 9(4):317–327.
Google Scholar
Pierrel, J.-M. (1989). Rapport de faisabilité d’un poste de standardiste pour mal-voyant utilisant un dialogue homme-machine multi-media incluant une forte composante orale. Technical report, CRIN-SOLLAC.
Google Scholar
Polifroni, J., Hirschman, L., Seneff, S., and Zue, V. (1992). Experiments in evaluating interactive spoken language systems. In DARPA Workshop Speech and Natural Language, pages 28–33.
Chapter Google Scholar
Price, P., Hirschman, L., Shriberg, E., and Wade, E. (1992). Subject-based evaluation measures for interactive spoken language systems. In DARPA Workshop Speech and Natural Language, pages 34–39.
Chapter Google Scholar
Riccio, A., Ceglie, F., and Brancaccio, A. (1993). Reliable assessment of speech recognisers for telephone environment. In EUROSPEECH, pages 1885–1888.
Google Scholar
Roe, D. and Wilpon, J. (1993). Whither speech recognition: The next 25 years. IEEE Communications Magazine, pages 54–62.
Google Scholar
Rose, R. (1993). Definition of subword acoustic units for wordspotting. In EURO-SPEECH, pages 1049–1052.
Google Scholar
Rudnicky, A. (1989). The design of voice-driven interfaces. In DARPA Workshop on Spoken Language Systems, pages 120–124.
Google Scholar
Schank, R. and Abelson, R. (1977). Scripts, Plans, Goals and Understanding. Lawrence Erlbaum.
MATH Google Scholar
Schwartz, R., Chow, Y., Roucos, S„ Krasner, M., and Makhoul, J. (1984). Improved hidden Markov modeling phonemes for continuous speech recognition. In ICASSP, pages 35.6.1–35.6.4.
Google Scholar
Seto, S., Kanazawa, H., Shinchi, H., and Takebayashi, Y. (1994). Spontaneous speech dialogue system TOSBURG II and its evaluation. Speech Communication, 15(3–4):341–353.
Article Google Scholar
Shirai, K. and Furui, S. (1994). Special issue on spoken dialogue: K. Shirai and S. Fund, editors. Speech Communication, 15(3–4).
Google Scholar
Simpson, A. and Fraser, N. (1993). Black box and glass box evaluation of the SUNDIAL system. In EUROSPEECH, pages 1423–1426.
Google Scholar
Siroux, J., Kharoune, M., and Guyomard, M. (1994). Application and dialogue in the SUNDIAL system. In ICSLP, pages 927–930.
Google Scholar
Sloboda, T. (1995). Dictionary learning: Performance through consistency. In ICASSP, pages 453–456.
Google Scholar
Steeneken, H. and Van Velden, J. (1991). RAMOS—Recognizer Assessment by means of Manipulation Of Speech applied to connected speech recognition. In EURO-SPEECH, pages 529–532.
Google Scholar
Steeneken, H. and Varga, A. (1993). Assessment for automatic speech recognition: I. Comparison of assessment methods. Speech Communication, 12(3):241–246.
Article Google Scholar
Strong, R. (1993). CASPER: A speech interface for the Macintosh. In EUROSPEECH, pages 2073–2076.
Google Scholar
Tate, M., Webster, R., and Weeks, R. (1993). Evaluation and prototyping of dialogues for voice applications. In Baber, C. and Noyes, J., editors, Interactive Speech Technology, pages 157–165. Taylor&Francis.
Google Scholar
Taylor, M., Néel, F., and D. G. Bouwhuis (1989). The Structure of Multimodal Dialogue: M.M. Taylor, F. Néel, and D.G. Bouwhuis, editors. Elsevier Science Publishers.
Google Scholar
Van de Vegte, J. and Taylor, M. (1990). Testing the effective vocabulary capacity method of evaluating speech recognizers. Speech Communication, Special Issue on Speech Input/Output Assessment and Speech Databases, 9(4):337–347.
Google Scholar
Vaiga, A. and Steeneken, H. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3):247–251.
Article Google Scholar
Wayrd, P. (1993). The comparative assessment of commercial speech recognizers. In EUROSPEECH, pages 1881–1884.
Google Scholar
Wilson, M., Sedlock, D., Binot, J., and Falzon, P. (1991). An architecture for multimodal dialogue. In Second Venaco Workshop: The Structure of Multimodal Dialogue.
Google Scholar
Winski, R. and Kordi, K. (1991). Assessment of continuous speech recognizers using recognizer sensitivity analysis. In EUROSPEECH, pages 521–524.
Google Scholar
Wong, M. (1994). Clustering triphones by phonological mapping. In ICSLP, pages 1939–1942.
Google Scholar
Wood, L., Pearce, D., and Novello, F. (1991). Improved vocabulary-independent sub-word HMM modelling. In ICASSP, pages 181–184.
Google Scholar
Yankelovich, N. and Baatz, E. (1994). A framework for building speech applications. In AVIOS, pages 179–188.
Google Scholar
Young, S. (1989). The MINDS system: Using context and dialog to enhance speech recognition. In DARPA Speech and Natural Language, pages 131–136.
Google Scholar
Young, S. and Proctor, C. (1989). The design and implementation of dialogue control in voice operated database inquiry systems. Computer Speech and Language, 3:329–353.
Article Google Scholar
Zue, V., Glass, J., Goodine, D., Leung, H., Philipps, M., Polifroni, J., and Seneff, S. (1990). The VOYAGER speech understanding system: Preliminary development and evaluation. In ICASSP, pages 73–76.
Google Scholar
Zue, V., Seneff, S., Polifroni, J., Philipps, M., Pao, C, Goodine, D., Goddeau, D., and Glass, J. (1994). PEGASUS: A spoken dialogue interface for on-line air travel planning. Speech Communication, 15(3–4):331–340.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Speech Technology Laboratory, USA
Jean-Claude Junqua
CRIN - INRIA, France
Jean-Paul Haton

Authors

Jean-Claude Junqua
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Paul Haton
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Junqua, JC., Haton, JP. (1996). Application Domain, Human Factors, and Dialogue. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol 341. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1297-0_13

Download citation

DOI: https://doi.org/10.1007/978-1-4613-1297-0_13
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8555-7
Online ISBN: 978-1-4613-1297-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics