Skip to main content

More Than Words: Designing Multimodal Systems

  • Chapter
Usability of Speech Dialog Systems

Part of the book series: Signals and Commmunication Technologies ((SCT))

Abstract

The majority of multimodal systems on research level as well as on product level involve speech input. With speech as an eyes-free hands-free input modality, these systems enable the user to interact more effectively across a large range of tasks and environments. But the advantages of multimodal user interfaces will only show up if they are designed to support the abilities and characteristics of the human users. Thus, it is necessary to integrate research results from cognitive sciences in the development process. This paper discusses several experimental findings that demonstrate this necessity. User-centered design methods and user testing will further improve the usability of multimodal systems. However, compared to voice-only interfaces the design, development and usability testing of multimodal systems are far more complicated. A process model shows how the interplay between the development of system components, user-centered evaluation and the integration of knowledge from cognitive sciences can be organized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Akyol, S., Libuda, L. & Kraiss, K. F. (2001). Multimodale Benutzung adaptiver Kfz-Bordsysteme. In T. Jürgensohn & K. P. Timpe (Eds.), Kraftfahrzeugführung (pp. 137–154). Berlin: Springer.

    Google Scholar 

  • Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM,26(11), 832–843.

    Article  MATH  Google Scholar 

  • Altınsoy, E. (2006). Auditory-tactile interaction in virtual environments. Aachen: Shaker.

    Google Scholar 

  • Arsenault, R. & Ware, C. (2000). Eye-hand co-ordination with force feedback. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 408–414). New York: ACM Press.

    Google Scholar 

  • Baber, C. & Noyes, J. (1996). Automatic speech recognition in adverse environments. Human Factors,38(1), 142–155.

    Article  Google Scholar 

  • Bengler, K. (2001). Aspekte der multimodalen Bedienung und Anzeige im Automobil. In T. Jürgensohn & K. P. Timpe (Eds.), Kraftfahrzeugführung (pp. 195–205). Berlin: Springer.

    Google Scholar 

  • Benoît, J., Martin, C., Pelachaud, C., Schomaker, L. & Suhm, B. (2000). Audio-visual and multimodal speech-based systems. In Handbook of multimodal and spoken dialogue systems: Resources, terminology and product evaluation (pp. 102–203). Boston: Kluwer Academic Publishers.

    Google Scholar 

  • Bolt, R. A. (1980). Put that there: Voice and gesture at the graphics interface. ACM Computer Graphics,14, 262–270.

    Article  Google Scholar 

  • Brewster, S. A. (1998). The design of sonically-enhanced widgets. Interacting with Computers,11(2), 211–235.

    Article  Google Scholar 

  • Buckner, D. N. & McGrath, J. J. (Eds.). (1963). Vigilance: a symposium. New York: McGraw-Hill.

    Google Scholar 

  • Bucur, B., Allen, P. A., Sanders, R. E., Ruthruff, E. & Murphy, M. D. (2005). Redundancy gain and coactivation in bimodal detection: Evidence for the preservation of coactive processing in older adults. Journals of Gerontology: Psychological Sciences and Social Sciences,60(5), 279–282.

    Google Scholar 

  • Card, S. K., Moran, T. P. & Newell, A. (1986). The model human processor: An engineering model of human performance. In K. R. Boff, L. Kaufman & J. P. Thomas (Eds.), Handbook of perception and human performance (Vol. 2: Cognitive processes and performance, chap. 45, pp. 1–35). Oxford, UK: John Wiley & Sons.

    Google Scholar 

  • Cockburn, A. & Brewster, S. A. (2005). Multimodal feedback for the acquisition of small targets. Ergonomics,48(9), 1129–1150.

    Article  Google Scholar 

  • Diederich, A. & Colonius, H. (2004). Bimodal and trimodal multisensory enhancement: Effects of stimulus onset and intensity on reaction time. Perception & Psychophysics,66(8), 1388–1404.

    Google Scholar 

  • Driver, J. & Spence, C. (2004). Crossmodal spatial attention: Evidence from human performance. In C. Spence & J. Driver (Eds.), Crossmodal space and crossmodal attention (pp. 179–220). Oxford, UK: Oxford University Press.

    Google Scholar 

  • Edworthy, J. & Adams, A. (1996). Warning design: A research prospective. London: Taylor & Francis.

    Google Scholar 

  • ETSI EG 202 191. (2003). Human Factors (HF); Multimodal interaction, communication and navigation guidelines. Sophia-Antipolis Cedex, France: ETSI. Retrieved December 10, 2006, from http://docbox.etsi.org/EC_Files/ EC_Files/ eg_202191v010101p.pdf.

    Google Scholar 

  • Gielen, S. C., Schmidt, R. A. & van den Heuvel, P. J. (1983). On the nature of intersensory facilitation of reaction time. Perception & Psychophysics,34(2), 161–168.

    Google Scholar 

  • Go, K. & Carroll, J. M. (2004). Scenario-based task analysis. In D. Diaper & N. A. Stanton (Eds.), The handbook of task analysis for human-computer interaction (pp. 117–134). Mahwah, NJ: Lawrence Erlbaum Associates Publishers.

    Google Scholar 

  • Göbel, M., Luczak, H., Springer, J., Hedicke, V. & Rötting, M. (1995). Tactile feedback applied to computer mice. International Journal of Human-Computer Interaction,7(1), 1–24.

    Article  Google Scholar 

  • Graham, R., Aldridge, L., Carter, C. & Lansdown, T. C. (1999). The design of in-car speech recognition interfaces for usability and user acceptance. In D. Harris (Ed.), Engineering psychology and cognitive ergonomics: Job design, product design and human-computer interaction (Vol. 4, pp. 313–320). Aldershot: Ashgate.

    Google Scholar 

  • Gulliksen, J., Göransson, B., Boivie, I., Blomkvist, S., Persson, J. & Cajander, A. (2003). Key principles for user-centered systems design. Behaviour & Information Technology,22(6), 397–409.

    Article  Google Scholar 

  • Hempel, T. & Altınsoy, E. (2005). Multimodal user interfaces: Designing media for the auditory and the tactile channel. In R. W. Proctor & K.-P. L. Vu (Eds.), Handbook of human factors in web design (pp. 134–155). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Hempel, T. & Vilimek, R. (2007). Zum Einfluss von sprachlichen und nichtsprachlichen Systemausgaben auf Arbeitsgedächtnis, Reaktionszeit und Fehlerrate: Grundlagen für den Einsatz im Kfz. In S.-R. Mehra & P. Leistner (Eds.), Fortschritte der Akustik – DAGA 2007 (pp. 299–300). Berlin: DEGA.

    Google Scholar 

  • Howarth, C. I. & Treisman, M. (1958). The effect of warning interval on the electric phosphene and auditory thresholds. Quarterly Journal of Experimental Psychology,10, 130–141.

    Article  Google Scholar 

  • ISO 924111. (1998). Ergonomic requirements for office work with visual display terminals (VDTs). Part 11: Guidance on usability. Geneva, Switzerland: International Organization for Standardization.

    Google Scholar 

  • ISO 13407. (1999). Human-centered design processes for interactive sytems. Berlin: Beuth.

    Google Scholar 

  • Jacko, J. A., Emery, V. K., Edwards, P. J., Ashok, M., Barnard, L., Kongnakorn, T., Moloney, K. P. & Sainfort, F. (2004). The effects of multimodal feedback on older adults’ task performance given varying levels of computer experience. Behaviour & Information Technology,23(4), 247–264.

    Article  Google Scholar 

  • Jacko, J. A., Moloney, K. P., Kongnakorn, T., Barnard, L., Edwards, P. J., Leonard, V. K., Sainfort, F. & Scott, I. U. (2005). Multimodal feedback as a solution to ocular disease-based user performance decrements in the absence of functional visual loss. International Journal of Human-Computer Interaction,18(2), 183–218.

    Article  Google Scholar 

  • Kaber, D. B., Wright, M. C. & Sheik-Nainar, M. A. (2006). Investigation of multi-modal interface features for adaptive automation of a human-robot system. International Journal of Human-Computer Studies,64(6), 527–540.

    Article  Google Scholar 

  • Kantowitz, B. H. & Knight, J. L. (1976). Testing tapping timesharing: II Auditory secondary task. Acta Psychologica,40(5), 343–362.

    Article  Google Scholar 

  • Keates, S. & Robinson, P. (1999). Gestures and multimodal input. Behaviour & Information Technology,18(1), 36–44.

    Article  Google Scholar 

  • Kobus, D. A., Russotti, J., Schlichting, C., Haskell, G., Carpenter, S. & Wojtowicz, J. (1986). Multimodal detection and recognition performance of sonar operators. Human Factors,28(1), 23–29.

    Google Scholar 

  • Kohlrausch, A. G. & van de Par, S. L. (2005). Audio visual interaction in the context of multi-media applications. In J. Blauert (Ed.), Communication acoustics. Berlin: Springer.

    Google Scholar 

  • Larson, J. (2006). Common sense suggestions for developing multimodal user interfaces (W3C Working Group Note). Retrieved December 10, 2006, from http://www.w3.org/TR/2006/NOTE-mmi-suggestions-20060911/.

    Google Scholar 

  • Liu, Y. C. (2001). Comparative study of the effects of auditory, visual and multimodality displays on driver’s performance in advanced traveller information systems. Ergonomics,44, 425–442.

    Article  Google Scholar 

  • Michon, J. A. (1985). A critical view on driver behavior models: What do we know, what should we do? In L. Evans & R. Schwing (Eds.), Human behavior and traffic safety (pp. 485–520). New York: Plenum Press.

    Google Scholar 

  • Navon, D. & Gopher, D. (1979). On the economy of the human-processing system. Psychological Review,86(3), 214–255.

    Article  Google Scholar 

  • Nigay, L. & Coutaz, J. (1993). A design space for multimodal systems – concurrent processing and data fusion. In INTERCHI ’93: Proceedings of the Conference on Human Factors and Computing Systems (pp. 172–178). New York: ACM Press.

    Google Scholar 

  • Norman, D. A. (1986). Cognitive engineering. In D. A. Norman & S. W. Draper (Eds.), User centered system design: New perspectives on human-computer interaction (pp. 31–61). Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Norman, D. A. & Draper, S. W. (1986). User centered system design: New perspectives on human-computer interaction. Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Oakley, I., McGee, M. R., Brewster, S. A. & Gray, P. D. (2000). Putting the feel in ‘Look and Feel’. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 415–422). New York: ACM Press.

    Google Scholar 

  • Oviatt, S. L. (1994). Interface techniques for minimizing disfluent input to spoken language systems. In Proceedings of the SIGCHI conference on human factors in computing systems: Celebrating interdependence (CHI’94) (pp. 205–210). New York: ACM Press.

    Google Scholar 

  • Oviatt, S. L. (1997). Multimodal interactive maps: Designing for human performance. Human-Computer Interaction,12, 93–129.

    Article  Google Scholar 

  • Oviatt, S. L. (1999). Ten myths of multimodal interaction. Communications of the ACM,42(11), 74–81.

    Article  Google Scholar 

  • Oviatt, S. L. (2003). Multimodal interfaces. In J. A. Jacko & A. Sears (Eds.), The human-computer interaction handbook: Fundamentals, evolving technologies and emerging applications (pp. 286–304). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Oviatt, S. L., Coulston, R., Tomko, S., Benfang, X., Lunsford, R., Wesson, M. & Carmichael, L. (2003). Toward a theory of organized multimodal integration patterns during human-computer interaction. In Proceedings of the 5th international conference on multimodal interfaces (pp. 44–51). New York: ACM Press.

    Google Scholar 

  • Oviatt, S. L., DeAngeli, A. & Kuhn, K. (1997). Integration and synchronization of input modes during human-computer interaction. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 415–422). New York: ACM Press.

    Google Scholar 

  • Oviatt, S. L. & Kuhn, K. (1998). Referential features and linguistic indirection in multimodal language. In Proceedings of the international conference on spoken language processing (pp. 2339–2342). Sydney: ASSTA, Inc.

    Google Scholar 

  • Oviatt, S. L., Lunsford, R. & Coulston, R. (2005). Individual differences in multimodal integration patterns: What are they and why do they exist? In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 241–249). New York: ACM Press.

    Google Scholar 

  • Reeves, L. M., Lai, J., Larson, J. A., Oviatt, S. L., Balaji, T. S., Buisine, S., Collings, P., Cohen, P. R., Kraal, B., Martin, J. C., McTear, M., Raman, T. V., Stanney, K. M., Su, H. & Wang, Q. Y. (2004). Guidelines for multimodal user interface design. Communications of the ACM,41(1), 57–59.

    Article  Google Scholar 

  • Reichenauer, A. (2005). LUCIA: Development of a comprehensive information architecture process model for websites. Unpublished dissertation, University of Regensburg, Germany.

    Google Scholar 

  • Richards, M. & Underwood, K. (1984). Talking to machines: How are people naturally inclined to speak? In E. D. Megaw (Ed.), Contemporary Ergonomics 1984 (pp. 62–67). London: Taylor & Francis.

    Google Scholar 

  • Rosson, M. B. & Carroll, J. M. (2001). Usability engineering: Scenario-based development of human-computer interaction. San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Rosson, M. B. & Carroll, J. M. (2003). Scenario-based design. In J. A. Jacko & A. Sears (Eds.), The human-computer interaction handbook: Fundamentals, evolving technologies and emerging applications (pp. 1032–1050). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Schomaker, L., Nijtmans, J., Camurri, A., Lavagetto, F., Morasso, P., Benoît, C., Guiard-Marigny, T., Le Goff, B., Robert-Ribes, J., Adjoudani, A., Defée, I., Münch, S., Hartung, K. & Blauert, J. (1995). A taxonomy of multimodal interaction in the human information processing system. Multimodal integration for advanced multimedia interfaces (Report of the Esprit Project 8579 MIAMI No. WP 1). Nijmegen, Netherlands: University of Nijmegen.

    Google Scholar 

  • Seagull, F. J., Wickens, C. D. & Loeb, R. G. (2001). When is less more? Attention and workload in auditory, visual and redundant patient-monitoring conditions. In Proceedings of the Human Factors Society 45th Annual Meeting (pp. 1395–1399). Santa Monica, CA: Human Factors and Ergonomics Society.

    Google Scholar 

  • Spence, C. & Driver, J. (1997). Cross-modal links in attention between audition, vision, and touch: Implications for interface design. International Journal of Cognitive Ergonomics,1(4), 351–373.

    Google Scholar 

  • Spence, C. & Driver, J. (1999). Multiple resources and multimodal interface design. In D. Harris (Ed.), Engineering psychology and cognitive ergonomics: Transportation systems, medical ergonomics and training (Vol. 3, pp. 305–312). Aldershot: Ashgate.

    Google Scholar 

  • Todd, J. W. (1912). Reactions to multiple stimuli (Archives of psychology, No. 25). New York: Science Press.

    Google Scholar 

  • Tzelgov, J., Srebro, R., Henik, A. & Kushelevsky, A. (1987). Radiation search and detection by ear and by eye. Human Factors,29(1), 87–95.

    Google Scholar 

  • Van Laer, J., Galanter, E. H. & Klein, S. J. (1960). Factors relevant to the development of aircraft warning and caution signal systems. Aerospace Medicine,31, 31–39.

    Google Scholar 

  • Vilimek, R. & Hempel, T. (2005a). Effects of speech and non-speech sounds on short-term memory and possible implications for in-vehicle use. In Proceedings of the 11th International Conference on Auditory Display ICAD 2005 (pp. 344–350). Limerick, Ireland: ICAD.

    Google Scholar 

  • Vilimek, R. & Hempel, T. (2005b). Eine nutzerzentrierte Analyse von Erfolgsfaktoren zur Sprachbedienung im Automobil. Forum Ware,32(1–4), 47–51.

    Google Scholar 

  • Vilimek, R., Hempel, T. & Otto, B. (2007). Multimodal interfaces for in-vehicle applications. In J. A. Jacko (Ed.), Human-Computer Interaction, Part III, HCII 2007, LNCS 4552 (pp. 216–224). Berlin: Springer.

    Google Scholar 

  • Vilimek, R. & Zimmer, A. (2007). Development and evaluation of a multimodal touchpad for advanced in-vehicle systems. In D. Harris (Ed.), Engineering psychology and cognitive ergonomics, HCII 2007, LNAI 4562 (pp. 842–851). Berlin: Springer.

    Google Scholar 

  • Vitense, H. S., Jacko, J. A. & Emery, V. K. (2003). Multimodal feedback: an assessment of performance and mental workload. Ergonomics,46, 68–87.

    Article  Google Scholar 

  • Vogels, I. M. L. C. (2004). Detection of temporal delays in visual-haptic interfaces. Human Factors,46(1), 118–134.

    Article  Google Scholar 

  • Wickens, C. D. (1984). Processing resources in attention. In R. Parasuraman & D. R. Davies (Eds.), Varieties of attention (pp. 63–102). London: Academic Press.

    Google Scholar 

  • Wickens, C. D. (2002). Multiple resources and performance prediction. Theoretical Issues in Ergonomics Science,3(2), 159–177.

    Article  Google Scholar 

  • Wickens, C. D. & Hollands, J. G. (2000). Engineering psychology and human performance (3rd ed.). Upper Saddle River, NJ: Prentice Hall.

    Google Scholar 

  • Zoltan-Ford, E. (1991). How to get people to say and type what computers can understand. International Journal of Man-Machine Studies,34(4), 527–547.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Vilimek, R. (2008). More Than Words: Designing Multimodal Systems. In: Usability of Speech Dialog Systems. Signals and Commmunication Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78343-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78343-5_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78342-8

  • Online ISBN: 978-3-540-78343-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics