More Than Words: Designing Multimodal Systems

Vilimek, Roman

doi:10.1007/978-3-540-78343-5_6

Roman Vilimek²

Part of the book series: Signals and Commmunication Technologies ((SCT))

722 Accesses
1 Citations

Abstract

The majority of multimodal systems on research level as well as on product level involve speech input. With speech as an eyes-free hands-free input modality, these systems enable the user to interact more effectively across a large range of tasks and environments. But the advantages of multimodal user interfaces will only show up if they are designed to support the abilities and characteristics of the human users. Thus, it is necessary to integrate research results from cognitive sciences in the development process. This paper discusses several experimental findings that demonstrate this necessity. User-centered design methods and user testing will further improve the usability of multimodal systems. However, compared to voice-only interfaces the design, development and usability testing of multimodal systems are far more complicated. A process model shows how the interplay between the development of system components, user-centered evaluation and the integration of knowledge from cognitive sciences can be organized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Akyol, S., Libuda, L. & Kraiss, K. F. (2001). Multimodale Benutzung adaptiver Kfz-Bordsysteme. In T. Jürgensohn & K. P. Timpe (Eds.), Kraftfahrzeugführung (pp. 137–154). Berlin: Springer.
Google Scholar
Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM,26(11), 832–843.
Article MATH Google Scholar
Altınsoy, E. (2006). Auditory-tactile interaction in virtual environments. Aachen: Shaker.
Google Scholar
Arsenault, R. & Ware, C. (2000). Eye-hand co-ordination with force feedback. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 408–414). New York: ACM Press.
Google Scholar
Baber, C. & Noyes, J. (1996). Automatic speech recognition in adverse environments. Human Factors,38(1), 142–155.
Article Google Scholar
Bengler, K. (2001). Aspekte der multimodalen Bedienung und Anzeige im Automobil. In T. Jürgensohn & K. P. Timpe (Eds.), Kraftfahrzeugführung (pp. 195–205). Berlin: Springer.
Google Scholar
Benoît, J., Martin, C., Pelachaud, C., Schomaker, L. & Suhm, B. (2000). Audio-visual and multimodal speech-based systems. In Handbook of multimodal and spoken dialogue systems: Resources, terminology and product evaluation (pp. 102–203). Boston: Kluwer Academic Publishers.
Google Scholar
Bolt, R. A. (1980). Put that there: Voice and gesture at the graphics interface. ACM Computer Graphics,14, 262–270.
Article Google Scholar
Brewster, S. A. (1998). The design of sonically-enhanced widgets. Interacting with Computers,11(2), 211–235.
Article Google Scholar
Buckner, D. N. & McGrath, J. J. (Eds.). (1963). Vigilance: a symposium. New York: McGraw-Hill.
Google Scholar
Bucur, B., Allen, P. A., Sanders, R. E., Ruthruff, E. & Murphy, M. D. (2005). Redundancy gain and coactivation in bimodal detection: Evidence for the preservation of coactive processing in older adults. Journals of Gerontology: Psychological Sciences and Social Sciences,60(5), 279–282.
Google Scholar
Card, S. K., Moran, T. P. & Newell, A. (1986). The model human processor: An engineering model of human performance. In K. R. Boff, L. Kaufman & J. P. Thomas (Eds.), Handbook of perception and human performance (Vol. 2: Cognitive processes and performance, chap. 45, pp. 1–35). Oxford, UK: John Wiley & Sons.
Google Scholar
Cockburn, A. & Brewster, S. A. (2005). Multimodal feedback for the acquisition of small targets. Ergonomics,48(9), 1129–1150.
Article Google Scholar
Diederich, A. & Colonius, H. (2004). Bimodal and trimodal multisensory enhancement: Effects of stimulus onset and intensity on reaction time. Perception & Psychophysics,66(8), 1388–1404.
Google Scholar
Driver, J. & Spence, C. (2004). Crossmodal spatial attention: Evidence from human performance. In C. Spence & J. Driver (Eds.), Crossmodal space and crossmodal attention (pp. 179–220). Oxford, UK: Oxford University Press.
Google Scholar
Edworthy, J. & Adams, A. (1996). Warning design: A research prospective. London: Taylor & Francis.
Google Scholar
ETSI EG 202 191. (2003). Human Factors (HF); Multimodal interaction, communication and navigation guidelines. Sophia-Antipolis Cedex, France: ETSI. Retrieved December 10, 2006, from http://docbox.etsi.org/EC_Files/ EC_Files/ eg_202191v010101p.pdf.
Google Scholar
Gielen, S. C., Schmidt, R. A. & van den Heuvel, P. J. (1983). On the nature of intersensory facilitation of reaction time. Perception & Psychophysics,34(2), 161–168.
Google Scholar
Go, K. & Carroll, J. M. (2004). Scenario-based task analysis. In D. Diaper & N. A. Stanton (Eds.), The handbook of task analysis for human-computer interaction (pp. 117–134). Mahwah, NJ: Lawrence Erlbaum Associates Publishers.
Google Scholar
Göbel, M., Luczak, H., Springer, J., Hedicke, V. & Rötting, M. (1995). Tactile feedback applied to computer mice. International Journal of Human-Computer Interaction,7(1), 1–24.
Article Google Scholar
Graham, R., Aldridge, L., Carter, C. & Lansdown, T. C. (1999). The design of in-car speech recognition interfaces for usability and user acceptance. In D. Harris (Ed.), Engineering psychology and cognitive ergonomics: Job design, product design and human-computer interaction (Vol. 4, pp. 313–320). Aldershot: Ashgate.
Google Scholar
Gulliksen, J., Göransson, B., Boivie, I., Blomkvist, S., Persson, J. & Cajander, A. (2003). Key principles for user-centered systems design. Behaviour & Information Technology,22(6), 397–409.
Article Google Scholar
Hempel, T. & Altınsoy, E. (2005). Multimodal user interfaces: Designing media for the auditory and the tactile channel. In R. W. Proctor & K.-P. L. Vu (Eds.), Handbook of human factors in web design (pp. 134–155). Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Hempel, T. & Vilimek, R. (2007). Zum Einfluss von sprachlichen und nichtsprachlichen Systemausgaben auf Arbeitsgedächtnis, Reaktionszeit und Fehlerrate: Grundlagen für den Einsatz im Kfz. In S.-R. Mehra & P. Leistner (Eds.), Fortschritte der Akustik – DAGA 2007 (pp. 299–300). Berlin: DEGA.
Google Scholar
Howarth, C. I. & Treisman, M. (1958). The effect of warning interval on the electric phosphene and auditory thresholds. Quarterly Journal of Experimental Psychology,10, 130–141.
Article Google Scholar
ISO 924111. (1998). Ergonomic requirements for office work with visual display terminals (VDTs). Part 11: Guidance on usability. Geneva, Switzerland: International Organization for Standardization.
Google Scholar
ISO 13407. (1999). Human-centered design processes for interactive sytems. Berlin: Beuth.
Google Scholar
Jacko, J. A., Emery, V. K., Edwards, P. J., Ashok, M., Barnard, L., Kongnakorn, T., Moloney, K. P. & Sainfort, F. (2004). The effects of multimodal feedback on older adults’ task performance given varying levels of computer experience. Behaviour & Information Technology,23(4), 247–264.
Article Google Scholar
Jacko, J. A., Moloney, K. P., Kongnakorn, T., Barnard, L., Edwards, P. J., Leonard, V. K., Sainfort, F. & Scott, I. U. (2005). Multimodal feedback as a solution to ocular disease-based user performance decrements in the absence of functional visual loss. International Journal of Human-Computer Interaction,18(2), 183–218.
Article Google Scholar
Kaber, D. B., Wright, M. C. & Sheik-Nainar, M. A. (2006). Investigation of multi-modal interface features for adaptive automation of a human-robot system. International Journal of Human-Computer Studies,64(6), 527–540.
Article Google Scholar
Kantowitz, B. H. & Knight, J. L. (1976). Testing tapping timesharing: II Auditory secondary task. Acta Psychologica,40(5), 343–362.
Article Google Scholar
Keates, S. & Robinson, P. (1999). Gestures and multimodal input. Behaviour & Information Technology,18(1), 36–44.
Article Google Scholar
Kobus, D. A., Russotti, J., Schlichting, C., Haskell, G., Carpenter, S. & Wojtowicz, J. (1986). Multimodal detection and recognition performance of sonar operators. Human Factors,28(1), 23–29.
Google Scholar
Kohlrausch, A. G. & van de Par, S. L. (2005). Audio visual interaction in the context of multi-media applications. In J. Blauert (Ed.), Communication acoustics. Berlin: Springer.
Google Scholar
Larson, J. (2006). Common sense suggestions for developing multimodal user interfaces (W3C Working Group Note). Retrieved December 10, 2006, from http://www.w3.org/TR/2006/NOTE-mmi-suggestions-20060911/.
Google Scholar
Liu, Y. C. (2001). Comparative study of the effects of auditory, visual and multimodality displays on driver’s performance in advanced traveller information systems. Ergonomics,44, 425–442.
Article Google Scholar
Michon, J. A. (1985). A critical view on driver behavior models: What do we know, what should we do? In L. Evans & R. Schwing (Eds.), Human behavior and traffic safety (pp. 485–520). New York: Plenum Press.
Google Scholar
Navon, D. & Gopher, D. (1979). On the economy of the human-processing system. Psychological Review,86(3), 214–255.
Article Google Scholar
Nigay, L. & Coutaz, J. (1993). A design space for multimodal systems – concurrent processing and data fusion. In INTERCHI ’93: Proceedings of the Conference on Human Factors and Computing Systems (pp. 172–178). New York: ACM Press.
Google Scholar
Norman, D. A. (1986). Cognitive engineering. In D. A. Norman & S. W. Draper (Eds.), User centered system design: New perspectives on human-computer interaction (pp. 31–61). Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Norman, D. A. & Draper, S. W. (1986). User centered system design: New perspectives on human-computer interaction. Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Oakley, I., McGee, M. R., Brewster, S. A. & Gray, P. D. (2000). Putting the feel in ‘Look and Feel’. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 415–422). New York: ACM Press.
Google Scholar
Oviatt, S. L. (1994). Interface techniques for minimizing disfluent input to spoken language systems. In Proceedings of the SIGCHI conference on human factors in computing systems: Celebrating interdependence (CHI’94) (pp. 205–210). New York: ACM Press.
Google Scholar
Oviatt, S. L. (1997). Multimodal interactive maps: Designing for human performance. Human-Computer Interaction,12, 93–129.
Article Google Scholar
Oviatt, S. L. (1999). Ten myths of multimodal interaction. Communications of the ACM,42(11), 74–81.
Article Google Scholar
Oviatt, S. L. (2003). Multimodal interfaces. In J. A. Jacko & A. Sears (Eds.), The human-computer interaction handbook: Fundamentals, evolving technologies and emerging applications (pp. 286–304). Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Oviatt, S. L., Coulston, R., Tomko, S., Benfang, X., Lunsford, R., Wesson, M. & Carmichael, L. (2003). Toward a theory of organized multimodal integration patterns during human-computer interaction. In Proceedings of the 5th international conference on multimodal interfaces (pp. 44–51). New York: ACM Press.
Google Scholar
Oviatt, S. L., DeAngeli, A. & Kuhn, K. (1997). Integration and synchronization of input modes during human-computer interaction. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 415–422). New York: ACM Press.
Google Scholar
Oviatt, S. L. & Kuhn, K. (1998). Referential features and linguistic indirection in multimodal language. In Proceedings of the international conference on spoken language processing (pp. 2339–2342). Sydney: ASSTA, Inc.
Google Scholar
Oviatt, S. L., Lunsford, R. & Coulston, R. (2005). Individual differences in multimodal integration patterns: What are they and why do they exist? In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 241–249). New York: ACM Press.
Google Scholar
Reeves, L. M., Lai, J., Larson, J. A., Oviatt, S. L., Balaji, T. S., Buisine, S., Collings, P., Cohen, P. R., Kraal, B., Martin, J. C., McTear, M., Raman, T. V., Stanney, K. M., Su, H. & Wang, Q. Y. (2004). Guidelines for multimodal user interface design. Communications of the ACM,41(1), 57–59.
Article Google Scholar
Reichenauer, A. (2005). LUCIA: Development of a comprehensive information architecture process model for websites. Unpublished dissertation, University of Regensburg, Germany.
Google Scholar
Richards, M. & Underwood, K. (1984). Talking to machines: How are people naturally inclined to speak? In E. D. Megaw (Ed.), Contemporary Ergonomics 1984 (pp. 62–67). London: Taylor & Francis.
Google Scholar
Rosson, M. B. & Carroll, J. M. (2001). Usability engineering: Scenario-based development of human-computer interaction. San Francisco: Morgan Kaufmann.
Google Scholar
Rosson, M. B. & Carroll, J. M. (2003). Scenario-based design. In J. A. Jacko & A. Sears (Eds.), The human-computer interaction handbook: Fundamentals, evolving technologies and emerging applications (pp. 1032–1050). Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Schomaker, L., Nijtmans, J., Camurri, A., Lavagetto, F., Morasso, P., Benoît, C., Guiard-Marigny, T., Le Goff, B., Robert-Ribes, J., Adjoudani, A., Defée, I., Münch, S., Hartung, K. & Blauert, J. (1995). A taxonomy of multimodal interaction in the human information processing system. Multimodal integration for advanced multimedia interfaces (Report of the Esprit Project 8579 MIAMI No. WP 1). Nijmegen, Netherlands: University of Nijmegen.
Google Scholar
Seagull, F. J., Wickens, C. D. & Loeb, R. G. (2001). When is less more? Attention and workload in auditory, visual and redundant patient-monitoring conditions. In Proceedings of the Human Factors Society 45th Annual Meeting (pp. 1395–1399). Santa Monica, CA: Human Factors and Ergonomics Society.
Google Scholar
Spence, C. & Driver, J. (1997). Cross-modal links in attention between audition, vision, and touch: Implications for interface design. International Journal of Cognitive Ergonomics,1(4), 351–373.
Google Scholar
Spence, C. & Driver, J. (1999). Multiple resources and multimodal interface design. In D. Harris (Ed.), Engineering psychology and cognitive ergonomics: Transportation systems, medical ergonomics and training (Vol. 3, pp. 305–312). Aldershot: Ashgate.
Google Scholar
Todd, J. W. (1912). Reactions to multiple stimuli (Archives of psychology, No. 25). New York: Science Press.
Google Scholar
Tzelgov, J., Srebro, R., Henik, A. & Kushelevsky, A. (1987). Radiation search and detection by ear and by eye. Human Factors,29(1), 87–95.
Google Scholar
Van Laer, J., Galanter, E. H. & Klein, S. J. (1960). Factors relevant to the development of aircraft warning and caution signal systems. Aerospace Medicine,31, 31–39.
Google Scholar
Vilimek, R. & Hempel, T. (2005a). Effects of speech and non-speech sounds on short-term memory and possible implications for in-vehicle use. In Proceedings of the 11th International Conference on Auditory Display ICAD 2005 (pp. 344–350). Limerick, Ireland: ICAD.
Google Scholar
Vilimek, R. & Hempel, T. (2005b). Eine nutzerzentrierte Analyse von Erfolgsfaktoren zur Sprachbedienung im Automobil. Forum Ware,32(1–4), 47–51.
Google Scholar
Vilimek, R., Hempel, T. & Otto, B. (2007). Multimodal interfaces for in-vehicle applications. In J. A. Jacko (Ed.), Human-Computer Interaction, Part III, HCII 2007, LNCS 4552 (pp. 216–224). Berlin: Springer.
Google Scholar
Vilimek, R. & Zimmer, A. (2007). Development and evaluation of a multimodal touchpad for advanced in-vehicle systems. In D. Harris (Ed.), Engineering psychology and cognitive ergonomics, HCII 2007, LNAI 4562 (pp. 842–851). Berlin: Springer.
Google Scholar
Vitense, H. S., Jacko, J. A. & Emery, V. K. (2003). Multimodal feedback: an assessment of performance and mental workload. Ergonomics,46, 68–87.
Article Google Scholar
Vogels, I. M. L. C. (2004). Detection of temporal delays in visual-haptic interfaces. Human Factors,46(1), 118–134.
Article Google Scholar
Wickens, C. D. (1984). Processing resources in attention. In R. Parasuraman & D. R. Davies (Eds.), Varieties of attention (pp. 63–102). London: Academic Press.
Google Scholar
Wickens, C. D. (2002). Multiple resources and performance prediction. Theoretical Issues in Ergonomics Science,3(2), 159–177.
Article Google Scholar
Wickens, C. D. & Hollands, J. G. (2000). Engineering psychology and human performance (3rd ed.). Upper Saddle River, NJ: Prentice Hall.
Google Scholar
Zoltan-Ford, E. (1991). How to get people to say and type what computers can understand. International Journal of Man-Machine Studies,34(4), 527–547.
Article Google Scholar

Download references

Author information

Authors and Affiliations

User Interface Design Center Corporate Technology, Siemens AG, Munich, Germany
Roman Vilimek

Authors

Roman Vilimek
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vilimek, R. (2008). More Than Words: Designing Multimodal Systems. In: Usability of Speech Dialog Systems. Signals and Commmunication Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78343-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-78343-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78342-8
Online ISBN: 978-3-540-78343-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics