Abstract
The majority of multimodal systems on research level as well as on product level involve speech input. With speech as an eyes-free hands-free input modality, these systems enable the user to interact more effectively across a large range of tasks and environments. But the advantages of multimodal user interfaces will only show up if they are designed to support the abilities and characteristics of the human users. Thus, it is necessary to integrate research results from cognitive sciences in the development process. This paper discusses several experimental findings that demonstrate this necessity. User-centered design methods and user testing will further improve the usability of multimodal systems. However, compared to voice-only interfaces the design, development and usability testing of multimodal systems are far more complicated. A process model shows how the interplay between the development of system components, user-centered evaluation and the integration of knowledge from cognitive sciences can be organized.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akyol, S., Libuda, L. & Kraiss, K. F. (2001). Multimodale Benutzung adaptiver Kfz-Bordsysteme. In T. Jürgensohn & K. P. Timpe (Eds.), Kraftfahrzeugführung (pp. 137–154). Berlin: Springer.
Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM,26(11), 832–843.
Altınsoy, E. (2006). Auditory-tactile interaction in virtual environments. Aachen: Shaker.
Arsenault, R. & Ware, C. (2000). Eye-hand co-ordination with force feedback. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 408–414). New York: ACM Press.
Baber, C. & Noyes, J. (1996). Automatic speech recognition in adverse environments. Human Factors,38(1), 142–155.
Bengler, K. (2001). Aspekte der multimodalen Bedienung und Anzeige im Automobil. In T. Jürgensohn & K. P. Timpe (Eds.), Kraftfahrzeugführung (pp. 195–205). Berlin: Springer.
Benoît, J., Martin, C., Pelachaud, C., Schomaker, L. & Suhm, B. (2000). Audio-visual and multimodal speech-based systems. In Handbook of multimodal and spoken dialogue systems: Resources, terminology and product evaluation (pp. 102–203). Boston: Kluwer Academic Publishers.
Bolt, R. A. (1980). Put that there: Voice and gesture at the graphics interface. ACM Computer Graphics,14, 262–270.
Brewster, S. A. (1998). The design of sonically-enhanced widgets. Interacting with Computers,11(2), 211–235.
Buckner, D. N. & McGrath, J. J. (Eds.). (1963). Vigilance: a symposium. New York: McGraw-Hill.
Bucur, B., Allen, P. A., Sanders, R. E., Ruthruff, E. & Murphy, M. D. (2005). Redundancy gain and coactivation in bimodal detection: Evidence for the preservation of coactive processing in older adults. Journals of Gerontology: Psychological Sciences and Social Sciences,60(5), 279–282.
Card, S. K., Moran, T. P. & Newell, A. (1986). The model human processor: An engineering model of human performance. In K. R. Boff, L. Kaufman & J. P. Thomas (Eds.), Handbook of perception and human performance (Vol. 2: Cognitive processes and performance, chap. 45, pp. 1–35). Oxford, UK: John Wiley & Sons.
Cockburn, A. & Brewster, S. A. (2005). Multimodal feedback for the acquisition of small targets. Ergonomics,48(9), 1129–1150.
Diederich, A. & Colonius, H. (2004). Bimodal and trimodal multisensory enhancement: Effects of stimulus onset and intensity on reaction time. Perception & Psychophysics,66(8), 1388–1404.
Driver, J. & Spence, C. (2004). Crossmodal spatial attention: Evidence from human performance. In C. Spence & J. Driver (Eds.), Crossmodal space and crossmodal attention (pp. 179–220). Oxford, UK: Oxford University Press.
Edworthy, J. & Adams, A. (1996). Warning design: A research prospective. London: Taylor & Francis.
ETSI EG 202 191. (2003). Human Factors (HF); Multimodal interaction, communication and navigation guidelines. Sophia-Antipolis Cedex, France: ETSI. Retrieved December 10, 2006, from http://docbox.etsi.org/EC_Files/ EC_Files/ eg_202191v010101p.pdf.
Gielen, S. C., Schmidt, R. A. & van den Heuvel, P. J. (1983). On the nature of intersensory facilitation of reaction time. Perception & Psychophysics,34(2), 161–168.
Go, K. & Carroll, J. M. (2004). Scenario-based task analysis. In D. Diaper & N. A. Stanton (Eds.), The handbook of task analysis for human-computer interaction (pp. 117–134). Mahwah, NJ: Lawrence Erlbaum Associates Publishers.
Göbel, M., Luczak, H., Springer, J., Hedicke, V. & Rötting, M. (1995). Tactile feedback applied to computer mice. International Journal of Human-Computer Interaction,7(1), 1–24.
Graham, R., Aldridge, L., Carter, C. & Lansdown, T. C. (1999). The design of in-car speech recognition interfaces for usability and user acceptance. In D. Harris (Ed.), Engineering psychology and cognitive ergonomics: Job design, product design and human-computer interaction (Vol. 4, pp. 313–320). Aldershot: Ashgate.
Gulliksen, J., Göransson, B., Boivie, I., Blomkvist, S., Persson, J. & Cajander, A. (2003). Key principles for user-centered systems design. Behaviour & Information Technology,22(6), 397–409.
Hempel, T. & Altınsoy, E. (2005). Multimodal user interfaces: Designing media for the auditory and the tactile channel. In R. W. Proctor & K.-P. L. Vu (Eds.), Handbook of human factors in web design (pp. 134–155). Mahwah, NJ: Lawrence Erlbaum Associates.
Hempel, T. & Vilimek, R. (2007). Zum Einfluss von sprachlichen und nichtsprachlichen Systemausgaben auf Arbeitsgedächtnis, Reaktionszeit und Fehlerrate: Grundlagen für den Einsatz im Kfz. In S.-R. Mehra & P. Leistner (Eds.), Fortschritte der Akustik – DAGA 2007 (pp. 299–300). Berlin: DEGA.
Howarth, C. I. & Treisman, M. (1958). The effect of warning interval on the electric phosphene and auditory thresholds. Quarterly Journal of Experimental Psychology,10, 130–141.
ISO 924111. (1998). Ergonomic requirements for office work with visual display terminals (VDTs). Part 11: Guidance on usability. Geneva, Switzerland: International Organization for Standardization.
ISO 13407. (1999). Human-centered design processes for interactive sytems. Berlin: Beuth.
Jacko, J. A., Emery, V. K., Edwards, P. J., Ashok, M., Barnard, L., Kongnakorn, T., Moloney, K. P. & Sainfort, F. (2004). The effects of multimodal feedback on older adults’ task performance given varying levels of computer experience. Behaviour & Information Technology,23(4), 247–264.
Jacko, J. A., Moloney, K. P., Kongnakorn, T., Barnard, L., Edwards, P. J., Leonard, V. K., Sainfort, F. & Scott, I. U. (2005). Multimodal feedback as a solution to ocular disease-based user performance decrements in the absence of functional visual loss. International Journal of Human-Computer Interaction,18(2), 183–218.
Kaber, D. B., Wright, M. C. & Sheik-Nainar, M. A. (2006). Investigation of multi-modal interface features for adaptive automation of a human-robot system. International Journal of Human-Computer Studies,64(6), 527–540.
Kantowitz, B. H. & Knight, J. L. (1976). Testing tapping timesharing: II Auditory secondary task. Acta Psychologica,40(5), 343–362.
Keates, S. & Robinson, P. (1999). Gestures and multimodal input. Behaviour & Information Technology,18(1), 36–44.
Kobus, D. A., Russotti, J., Schlichting, C., Haskell, G., Carpenter, S. & Wojtowicz, J. (1986). Multimodal detection and recognition performance of sonar operators. Human Factors,28(1), 23–29.
Kohlrausch, A. G. & van de Par, S. L. (2005). Audio visual interaction in the context of multi-media applications. In J. Blauert (Ed.), Communication acoustics. Berlin: Springer.
Larson, J. (2006). Common sense suggestions for developing multimodal user interfaces (W3C Working Group Note). Retrieved December 10, 2006, from http://www.w3.org/TR/2006/NOTE-mmi-suggestions-20060911/.
Liu, Y. C. (2001). Comparative study of the effects of auditory, visual and multimodality displays on driver’s performance in advanced traveller information systems. Ergonomics,44, 425–442.
Michon, J. A. (1985). A critical view on driver behavior models: What do we know, what should we do? In L. Evans & R. Schwing (Eds.), Human behavior and traffic safety (pp. 485–520). New York: Plenum Press.
Navon, D. & Gopher, D. (1979). On the economy of the human-processing system. Psychological Review,86(3), 214–255.
Nigay, L. & Coutaz, J. (1993). A design space for multimodal systems – concurrent processing and data fusion. In INTERCHI ’93: Proceedings of the Conference on Human Factors and Computing Systems (pp. 172–178). New York: ACM Press.
Norman, D. A. (1986). Cognitive engineering. In D. A. Norman & S. W. Draper (Eds.), User centered system design: New perspectives on human-computer interaction (pp. 31–61). Hillsdale, NJ: Lawrence Erlbaum Associates.
Norman, D. A. & Draper, S. W. (1986). User centered system design: New perspectives on human-computer interaction. Mahwah, NJ: Lawrence Erlbaum Associates.
Oakley, I., McGee, M. R., Brewster, S. A. & Gray, P. D. (2000). Putting the feel in ‘Look and Feel’. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 415–422). New York: ACM Press.
Oviatt, S. L. (1994). Interface techniques for minimizing disfluent input to spoken language systems. In Proceedings of the SIGCHI conference on human factors in computing systems: Celebrating interdependence (CHI’94) (pp. 205–210). New York: ACM Press.
Oviatt, S. L. (1997). Multimodal interactive maps: Designing for human performance. Human-Computer Interaction,12, 93–129.
Oviatt, S. L. (1999). Ten myths of multimodal interaction. Communications of the ACM,42(11), 74–81.
Oviatt, S. L. (2003). Multimodal interfaces. In J. A. Jacko & A. Sears (Eds.), The human-computer interaction handbook: Fundamentals, evolving technologies and emerging applications (pp. 286–304). Mahwah, NJ: Lawrence Erlbaum Associates.
Oviatt, S. L., Coulston, R., Tomko, S., Benfang, X., Lunsford, R., Wesson, M. & Carmichael, L. (2003). Toward a theory of organized multimodal integration patterns during human-computer interaction. In Proceedings of the 5th international conference on multimodal interfaces (pp. 44–51). New York: ACM Press.
Oviatt, S. L., DeAngeli, A. & Kuhn, K. (1997). Integration and synchronization of input modes during human-computer interaction. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 415–422). New York: ACM Press.
Oviatt, S. L. & Kuhn, K. (1998). Referential features and linguistic indirection in multimodal language. In Proceedings of the international conference on spoken language processing (pp. 2339–2342). Sydney: ASSTA, Inc.
Oviatt, S. L., Lunsford, R. & Coulston, R. (2005). Individual differences in multimodal integration patterns: What are they and why do they exist? In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 241–249). New York: ACM Press.
Reeves, L. M., Lai, J., Larson, J. A., Oviatt, S. L., Balaji, T. S., Buisine, S., Collings, P., Cohen, P. R., Kraal, B., Martin, J. C., McTear, M., Raman, T. V., Stanney, K. M., Su, H. & Wang, Q. Y. (2004). Guidelines for multimodal user interface design. Communications of the ACM,41(1), 57–59.
Reichenauer, A. (2005). LUCIA: Development of a comprehensive information architecture process model for websites. Unpublished dissertation, University of Regensburg, Germany.
Richards, M. & Underwood, K. (1984). Talking to machines: How are people naturally inclined to speak? In E. D. Megaw (Ed.), Contemporary Ergonomics 1984 (pp. 62–67). London: Taylor & Francis.
Rosson, M. B. & Carroll, J. M. (2001). Usability engineering: Scenario-based development of human-computer interaction. San Francisco: Morgan Kaufmann.
Rosson, M. B. & Carroll, J. M. (2003). Scenario-based design. In J. A. Jacko & A. Sears (Eds.), The human-computer interaction handbook: Fundamentals, evolving technologies and emerging applications (pp. 1032–1050). Mahwah, NJ: Lawrence Erlbaum Associates.
Schomaker, L., Nijtmans, J., Camurri, A., Lavagetto, F., Morasso, P., Benoît, C., Guiard-Marigny, T., Le Goff, B., Robert-Ribes, J., Adjoudani, A., Defée, I., Münch, S., Hartung, K. & Blauert, J. (1995). A taxonomy of multimodal interaction in the human information processing system. Multimodal integration for advanced multimedia interfaces (Report of the Esprit Project 8579 MIAMI No. WP 1). Nijmegen, Netherlands: University of Nijmegen.
Seagull, F. J., Wickens, C. D. & Loeb, R. G. (2001). When is less more? Attention and workload in auditory, visual and redundant patient-monitoring conditions. In Proceedings of the Human Factors Society 45th Annual Meeting (pp. 1395–1399). Santa Monica, CA: Human Factors and Ergonomics Society.
Spence, C. & Driver, J. (1997). Cross-modal links in attention between audition, vision, and touch: Implications for interface design. International Journal of Cognitive Ergonomics,1(4), 351–373.
Spence, C. & Driver, J. (1999). Multiple resources and multimodal interface design. In D. Harris (Ed.), Engineering psychology and cognitive ergonomics: Transportation systems, medical ergonomics and training (Vol. 3, pp. 305–312). Aldershot: Ashgate.
Todd, J. W. (1912). Reactions to multiple stimuli (Archives of psychology, No. 25). New York: Science Press.
Tzelgov, J., Srebro, R., Henik, A. & Kushelevsky, A. (1987). Radiation search and detection by ear and by eye. Human Factors,29(1), 87–95.
Van Laer, J., Galanter, E. H. & Klein, S. J. (1960). Factors relevant to the development of aircraft warning and caution signal systems. Aerospace Medicine,31, 31–39.
Vilimek, R. & Hempel, T. (2005a). Effects of speech and non-speech sounds on short-term memory and possible implications for in-vehicle use. In Proceedings of the 11th International Conference on Auditory Display ICAD 2005 (pp. 344–350). Limerick, Ireland: ICAD.
Vilimek, R. & Hempel, T. (2005b). Eine nutzerzentrierte Analyse von Erfolgsfaktoren zur Sprachbedienung im Automobil. Forum Ware,32(1–4), 47–51.
Vilimek, R., Hempel, T. & Otto, B. (2007). Multimodal interfaces for in-vehicle applications. In J. A. Jacko (Ed.), Human-Computer Interaction, Part III, HCII 2007, LNCS 4552 (pp. 216–224). Berlin: Springer.
Vilimek, R. & Zimmer, A. (2007). Development and evaluation of a multimodal touchpad for advanced in-vehicle systems. In D. Harris (Ed.), Engineering psychology and cognitive ergonomics, HCII 2007, LNAI 4562 (pp. 842–851). Berlin: Springer.
Vitense, H. S., Jacko, J. A. & Emery, V. K. (2003). Multimodal feedback: an assessment of performance and mental workload. Ergonomics,46, 68–87.
Vogels, I. M. L. C. (2004). Detection of temporal delays in visual-haptic interfaces. Human Factors,46(1), 118–134.
Wickens, C. D. (1984). Processing resources in attention. In R. Parasuraman & D. R. Davies (Eds.), Varieties of attention (pp. 63–102). London: Academic Press.
Wickens, C. D. (2002). Multiple resources and performance prediction. Theoretical Issues in Ergonomics Science,3(2), 159–177.
Wickens, C. D. & Hollands, J. G. (2000). Engineering psychology and human performance (3rd ed.). Upper Saddle River, NJ: Prentice Hall.
Zoltan-Ford, E. (1991). How to get people to say and type what computers can understand. International Journal of Man-Machine Studies,34(4), 527–547.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Vilimek, R. (2008). More Than Words: Designing Multimodal Systems. In: Usability of Speech Dialog Systems. Signals and Commmunication Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78343-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-78343-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78342-8
Online ISBN: 978-3-540-78343-5
eBook Packages: EngineeringEngineering (R0)