Skip to main content

Multimodal Interfaces for Multimedia Information Agents

  • Chapter
Computational Models of Speech Pattern Processing

Part of the book series: NATO ASI Series ((NATO ASI F,volume 169))

  • 229 Accesses

Summary

When humans communicate they take advantage of a rich spectrum of cues. Some are verbal and acoustic. Some are non-verbal and non-acoustic. Signal processing technology has devoted much attention to the recognition of speech, as a single human communication signal. Most other complementary communication cues, however, remain unexplored and unused in human-computer interaction. In this paper we show that the addition of non-acoustic or non-verbal cues can significantly enhance robustness, flexibility, naturalness and performance of human-computer interaction. We demonstrate computer agents that use speech, gesture, handwriting, pointing, spelling jointly for more robust, natural and flexible human-computer interaction in the various tasks of an information worker: information creation, access, manipulation or dissemination.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Ando, Y. Kitahara and N. Hataoka: “Evaluation of Multimodal Interface Using Spoken Language and Pointing Gesture on Interior Design System”, Proc. ICSLP, 1994, Vol. 2, pp. 567–570.

    Google Scholar 

  2. T. Nishimoto, N. Shida, T. Kobayashi, K. Shirai: “Multimodal Drawing Tool Using Speech, Mouse and Keyboard”, Proc. ICSLP, 1994, Vol. 3, pp. 1287–1290.

    Google Scholar 

  3. J. Wang: “Integration of Eye-gaze, Voice and Manual Response in Multimodal User Interface”, Proc. ICSMC, 1995, Vol. 5, pp. 3938–42.

    Google Scholar 

  4. A. Waibel, M.T. Vo, P. Duchnowski, and S. Manke: “Multimodal Interfaces,” Artificial Intelligence Review, Special Volume on Integration of Natural Language and Vision. Processing, McKevitt, P. (Ed.), Vol. 10, Nos. 3–4, 1995.

    Google Scholar 

  5. B. Suhm, P. Geutner, T. Kemp, A. Lavie, L.J. Mayfield, A. McNair, I. Rogina, T. Schultz, T. Sloboda, W. Ward, M. Woszczyna and A. Waibel: “JANUS: Towards Multilingual Spoken Language Translation,” Proc. ARPA SLT Workshop 95 (Austin, Texas).

    Google Scholar 

  6. M.T. Vo and C. Wood: “Building and application framework for speech and pen input integration in multimodal learning interfaces,” Proc. ICASSP’96 (Atlanta, GA).

    Google Scholar 

  7. S. Manke, M. Finke and A. Waibel: “The Use of Dynamic Writing Information in a Connectionist On-Line Cursive Handwriting Recognition System,” Advances in Neural Information Processing Systems 6, Morgan Kaufmann, 1994.

    Google Scholar 

  8. J. Yang and A. Waibel: “A real-time face tracker,” Proc. WACV’96, pp. 142–147.

    Google Scholar 

  9. R. Stiefelhagen, J. Yang and A. Waibel: “A Modelbased Gaze Tracking System,” Proc. IEEE International Joint Symposia on Intelligence and Systems-Image, Speech & Natural Language Systems, pp. 304–310, 1996.

    Google Scholar 

  10. A.E. McNair and A. Waibel: “Improving Recognizer Acceptance through Robust, Natural Speech Repair”, Proc. ICSLP, 1994, Vol. 3, pp. 1299–1302.

    Google Scholar 

  11. S.L. Oviatt and R. VanGent: “Error Resolution during Multimodal Human-Computer Interaction”, Proc. ICSLP, 1996.

    Google Scholar 

  12. B. Suhm, B. Myers and A. Waibel: “Interactive Recovery from Speech Recognition Errors in Speech User Interfaces,” Proceedings of the International Conference on Spoken Language Processing-ICSLP, 1996.

    Google Scholar 

  13. A. Newell, S. Card and P. Moran: “The Psychology of Human Computer Interaction,” Lawrence Earlbaum Associates, 1983.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Waibel, A., Suhm, B., Vo, M.T., Yang, J. (1999). Multimodal Interfaces for Multimedia Information Agents. In: Ponting, K. (eds) Computational Models of Speech Pattern Processing. NATO ASI Series, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60087-6_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-60087-6_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-64250-0

  • Online ISBN: 978-3-642-60087-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics