Multimodal Interfaces for Multimedia Information Agents

Waibel, Alex; Suhm, Bernhard; Vo, Minh Tue; Yang, Jie

doi:10.1007/978-3-642-60087-6_35

Alex Waibel²,
Bernhard Suhm²,
Minh Tue Vo² &
…
Jie Yang²

Part of the book series: NATO ASI Series ((NATO ASI F,volume 169))

229 Accesses

Summary

When humans communicate they take advantage of a rich spectrum of cues. Some are verbal and acoustic. Some are non-verbal and non-acoustic. Signal processing technology has devoted much attention to the recognition of speech, as a single human communication signal. Most other complementary communication cues, however, remain unexplored and unused in human-computer interaction. In this paper we show that the addition of non-acoustic or non-verbal cues can significantly enhance robustness, flexibility, naturalness and performance of human-computer interaction. We demonstrate computer agents that use speech, gesture, handwriting, pointing, spelling jointly for more robust, natural and flexible human-computer interaction in the various tasks of an information worker: information creation, access, manipulation or dissemination.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Ando, Y. Kitahara and N. Hataoka: “Evaluation of Multimodal Interface Using Spoken Language and Pointing Gesture on Interior Design System”, Proc. ICSLP, 1994, Vol. 2, pp. 567–570.
Google Scholar
T. Nishimoto, N. Shida, T. Kobayashi, K. Shirai: “Multimodal Drawing Tool Using Speech, Mouse and Keyboard”, Proc. ICSLP, 1994, Vol. 3, pp. 1287–1290.
Google Scholar
J. Wang: “Integration of Eye-gaze, Voice and Manual Response in Multimodal User Interface”, Proc. ICSMC, 1995, Vol. 5, pp. 3938–42.
Google Scholar
A. Waibel, M.T. Vo, P. Duchnowski, and S. Manke: “Multimodal Interfaces,” Artificial Intelligence Review, Special Volume on Integration of Natural Language and Vision. Processing, McKevitt, P. (Ed.), Vol. 10, Nos. 3–4, 1995.
Google Scholar
B. Suhm, P. Geutner, T. Kemp, A. Lavie, L.J. Mayfield, A. McNair, I. Rogina, T. Schultz, T. Sloboda, W. Ward, M. Woszczyna and A. Waibel: “JANUS: Towards Multilingual Spoken Language Translation,” Proc. ARPA SLT Workshop 95 (Austin, Texas).
Google Scholar
M.T. Vo and C. Wood: “Building and application framework for speech and pen input integration in multimodal learning interfaces,” Proc. ICASSP’96 (Atlanta, GA).
Google Scholar
S. Manke, M. Finke and A. Waibel: “The Use of Dynamic Writing Information in a Connectionist On-Line Cursive Handwriting Recognition System,” Advances in Neural Information Processing Systems 6, Morgan Kaufmann, 1994.
Google Scholar
J. Yang and A. Waibel: “A real-time face tracker,” Proc. WACV’96, pp. 142–147.
Google Scholar
R. Stiefelhagen, J. Yang and A. Waibel: “A Modelbased Gaze Tracking System,” Proc. IEEE International Joint Symposia on Intelligence and Systems-Image, Speech & Natural Language Systems, pp. 304–310, 1996.
Google Scholar
A.E. McNair and A. Waibel: “Improving Recognizer Acceptance through Robust, Natural Speech Repair”, Proc. ICSLP, 1994, Vol. 3, pp. 1299–1302.
Google Scholar
S.L. Oviatt and R. VanGent: “Error Resolution during Multimodal Human-Computer Interaction”, Proc. ICSLP, 1996.
Google Scholar
B. Suhm, B. Myers and A. Waibel: “Interactive Recovery from Speech Recognition Errors in Speech User Interfaces,” Proceedings of the International Conference on Spoken Language Processing-ICSLP, 1996.
Google Scholar
A. Newell, S. Card and P. Moran: “The Psychology of Human Computer Interaction,” Lawrence Earlbaum Associates, 1983.
Google Scholar

Download references

Author information

Authors and Affiliations

Interactive Systems Laboratories, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15217, USA
Alex Waibel, Bernhard Suhm, Minh Tue Vo & Jie Yang

Authors

Alex Waibel
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Suhm
View author publications
You can also search for this author in PubMed Google Scholar
Minh Tue Vo
View author publications
You can also search for this author in PubMed Google Scholar
Jie Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Speech Research Unit, DERA Malvern, St. Andrew’s Road, WR14 4DT, Great Malvern, Worcs, UK
Keith Ponting

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Waibel, A., Suhm, B., Vo, M.T., Yang, J. (1999). Multimodal Interfaces for Multimedia Information Agents. In: Ponting, K. (eds) Computational Models of Speech Pattern Processing. NATO ASI Series, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60087-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-60087-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-64250-0
Online ISBN: 978-3-642-60087-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics