Summary
When humans communicate they take advantage of a rich spectrum of cues. Some are verbal and acoustic. Some are non-verbal and non-acoustic. Signal processing technology has devoted much attention to the recognition of speech, as a single human communication signal. Most other complementary communication cues, however, remain unexplored and unused in human-computer interaction. In this paper we show that the addition of non-acoustic or non-verbal cues can significantly enhance robustness, flexibility, naturalness and performance of human-computer interaction. We demonstrate computer agents that use speech, gesture, handwriting, pointing, spelling jointly for more robust, natural and flexible human-computer interaction in the various tasks of an information worker: information creation, access, manipulation or dissemination.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
H. Ando, Y. Kitahara and N. Hataoka: “Evaluation of Multimodal Interface Using Spoken Language and Pointing Gesture on Interior Design System”, Proc. ICSLP, 1994, Vol. 2, pp. 567–570.
T. Nishimoto, N. Shida, T. Kobayashi, K. Shirai: “Multimodal Drawing Tool Using Speech, Mouse and Keyboard”, Proc. ICSLP, 1994, Vol. 3, pp. 1287–1290.
J. Wang: “Integration of Eye-gaze, Voice and Manual Response in Multimodal User Interface”, Proc. ICSMC, 1995, Vol. 5, pp. 3938–42.
A. Waibel, M.T. Vo, P. Duchnowski, and S. Manke: “Multimodal Interfaces,” Artificial Intelligence Review, Special Volume on Integration of Natural Language and Vision. Processing, McKevitt, P. (Ed.), Vol. 10, Nos. 3–4, 1995.
B. Suhm, P. Geutner, T. Kemp, A. Lavie, L.J. Mayfield, A. McNair, I. Rogina, T. Schultz, T. Sloboda, W. Ward, M. Woszczyna and A. Waibel: “JANUS: Towards Multilingual Spoken Language Translation,” Proc. ARPA SLT Workshop 95 (Austin, Texas).
M.T. Vo and C. Wood: “Building and application framework for speech and pen input integration in multimodal learning interfaces,” Proc. ICASSP’96 (Atlanta, GA).
S. Manke, M. Finke and A. Waibel: “The Use of Dynamic Writing Information in a Connectionist On-Line Cursive Handwriting Recognition System,” Advances in Neural Information Processing Systems 6, Morgan Kaufmann, 1994.
J. Yang and A. Waibel: “A real-time face tracker,” Proc. WACV’96, pp. 142–147.
R. Stiefelhagen, J. Yang and A. Waibel: “A Modelbased Gaze Tracking System,” Proc. IEEE International Joint Symposia on Intelligence and Systems-Image, Speech & Natural Language Systems, pp. 304–310, 1996.
A.E. McNair and A. Waibel: “Improving Recognizer Acceptance through Robust, Natural Speech Repair”, Proc. ICSLP, 1994, Vol. 3, pp. 1299–1302.
S.L. Oviatt and R. VanGent: “Error Resolution during Multimodal Human-Computer Interaction”, Proc. ICSLP, 1996.
B. Suhm, B. Myers and A. Waibel: “Interactive Recovery from Speech Recognition Errors in Speech User Interfaces,” Proceedings of the International Conference on Spoken Language Processing-ICSLP, 1996.
A. Newell, S. Card and P. Moran: “The Psychology of Human Computer Interaction,” Lawrence Earlbaum Associates, 1983.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Waibel, A., Suhm, B., Vo, M.T., Yang, J. (1999). Multimodal Interfaces for Multimedia Information Agents. In: Ponting, K. (eds) Computational Models of Speech Pattern Processing. NATO ASI Series, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60087-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-60087-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-64250-0
Online ISBN: 978-3-642-60087-6
eBook Packages: Springer Book Archive