In-Depth Analysis of Multimodal Interaction: An Explorative Paradigm

Schüssel, Felix; Honold, Frank; Bubalo, Nikola; Huckauf, Anke; Traue, Harald; Hazer-Rau, Dilana

doi:10.1007/978-3-319-39516-6_22

Felix Schüssel¹⁴,
Frank Honold¹⁴,
Nikola Bubalo¹⁵,
Anke Huckauf¹⁵,
Harald Traue¹⁶ &
…
Dilana Hazer-Rau¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9732))

Included in the following conference series:

International Conference on Human-Computer Interaction

2331 Accesses
6 Citations

Abstract

Understanding the way people interact with multimodal systems is essential for their design and requires extensive empirical research. While approaches to design such systems have been explored from a technical perspective, the generic principles that drive the way users interact with them are largely unknown. Literature describes many findings, most of them specific to certain domains and sometimes even contradicting each other, and thus can hardly be generalized. In this article, we introduce an experimental setup that – despite being rather abstract – remains generic and allows in-depth exploration of various aspects with potential influence on users’ way of interaction. We describe the gamified task of our setup and present different variations for empirical research targeting specific research questions. Applying the experimental paradigm offers the chance for new in-depth insights into the general principles and influencing factors of multimodal interaction, which could in turn be transferred to many real-world applications.

You have full access to this open access chapter, Download conference paper PDF

Implementation Goals for Multimodal Interfaces in Human-Computer Interaction

Design and Development of Multimodal Applications: A Vision on Key Issues and Methods

Holistic Multimodal Interaction and Design

Keywords

1 Introduction

Multimodal interaction has been a topic of research for some while now. There has been a lot of progress concerning how to model and process multimodal inputs. Still, little is known about the generic principles that apply, e.g. the choice of modalities, the temporal relations of multimodal inputs, and what may be an even more important factor, the contextual parameters that influence multimodal interaction. To tackle these questions, we have designed an abstract, but still generic, experimental paradigm that allows the exploration of these questions in a flexible but controlled manner. Based on the developed paradigm, individual applications are generated and applied in different experimental setups.

2 Related Work

There have been a number of approaches on how to design multimodal interfaces and on how to model multimodal inputs from a system’s perspective (see [5] for an overview). A more generic perspective on multimodal interaction is examined by Turk [13]. Two of the open challenges stated therein are a thorough understanding of the issues relating to the cognitive load of users, and the development of better guidance and best practices for the design and evaluation of multimodal systems (ibid.). Tackling these challenges requires empirical evidence. Accordingly, there has been a lot of empirical research in multimodal interaction, mostly specific to a certain domain, including map interactions [4, 7–9], augmented reality [6], image manipulation [2], and music players [3].

Comparing the results of these studies reveals considerable differences. Although the domain and tasks in the work of Oviatt et al. [7–9] and Haas et al. [4] are quite similar, their results are in parts contradictory. While the former reports on users predominantly showing a simultaneous use of modalities, the latter reports on no users showing a simultaneous use of modalities. Similarly, the dependency on task difficulty remains ambiguous. The findings of [2, 6] are even more specific to their respective domains. Although these provide some insights, their generalizability and transferability to other applications seems doubtful. Dumas et al. take a broader perspective and present a test bed for the evaluation of fusion engines using a music player as example [3]. They conclude that more work is necessary on fusion engine’s adaption to context (i.e. environment and applications), as well as usage patterns and repetitive errors. This shows, that basic research on universal principles, which govern common tasks found in many applications, is still rare.

One aspect of the context is the influence of time pressure and the pressure of success onto the interaction behavior of a user. Getting the right ticket at the ticket vending machine in the train station last minute before the train leaves would be an example for such a situation. Including game elements to the study enables the simulation of such pressures on the user in laboratory settings. Respective gamification methods include feedback [1] on success and time pressure as well as a reward system [12]. These elevate both the intrinsic and extrinsic motivation of the user to complete the given tasks as reasoned by [10] based on the self determination theory.

3 A Visual Search Task for Empirical Research

In search of a task that is common to many different applications, we can identify operations on objects as a joint characteristic. Figure 1 shows different application examples.

These kinds of tasks are found throughout many applications and are thus chosen for our research. Empirical research poses additional requirements as well, e.g. tasks must be performed repeatedly without becoming routine or dull, and participants’ motivation must be kept high throughout the course of an experiment. In order to remedy these issues, we chose to use a gamified version of the task. In matters of the domain the tasks should take place in, we decided to use abstract representations of objects and operations.

Our solution is a visual search task, where the user has to identify the visually unique object and then specify its location and color (as a replacement for an arbitrary operation). Figure 2 shows a screenshot of the game. In the central area of Fig. 2, objects with differing shapes and colors are presented. In the given example, the green rectangle on position 3 is the unique object to be spotted by the user. The expected input can be provided either by using exclusive touch, exclusive speech, exclusive mouse or a combination of those modalities (e.g. touching the object and naming its color or vice versa).

4 Planned Research

The generic design of our setup enables the investigation of isolated factors such as the users previous experience, contextual parameters, and cognitive demand. The following sections provide further details on how the presented experimental paradigm can easily be adjusted to facilitate the respective research. Although they are based on the same paradigm, different setups are used for each focus of research. Where applicable, first results are presented as well.

4.1 A User’s Previous Experience: Individual Interaction Histories

In order to investigate the influence of individual user-centered interaction histories, the experimental paradigm is applied as shown in Fig. 3. The applied modalities are speech and touch inputs in any possible multimodal combination as described in Sect. 3. The inclusion of an induction phase, which requires users to solve the tasks applying only one of the possible four modality combinations, enables the investigation of the influence of individual interaction histories in the free interaction phase. We are particularly interested in the modality preferences of the free interaction phase, depending on the induced modality combination. Is there a favorite modality combination (regarding to error rates) and how long does it take users to apply it when induced otherwise? This could provide insights on the learning behavior of multimodal inputs.

Additionally, this experimental setup allows for an in-depth analysis of temporal relations of multimodal inputs, particularly with regard to the contradicting findings of the related work concerning the predominance of simultaneous and sequential interaction patterns. Results of a user study with this setup are reported in [11]. It is shown that a classification into simultaneous and sequential users may not be feasible in general. Instead, a more differentiated inspection of individual behavior is proposed and possible uses are discussed (cf. [11]).

4.2 Contextual Parameters: Pressure of Success and Time

Regarding contextual parameters, we investigate the influence of pressure to succeed in the task and time pressure, by varying the reward system and the available time to complete a task. These factors are supposed to have a significant influence not only on the error rate, but also on the way people interact with a system. To this end, an experiment was conducted in which we compared two groups of subjects which differed in the amount of auditory and visual feedback given by the system as well as the monetary reward given for the participation. Contrary to the Feedback group, the No-Feedback group got no auditory or visual feedback whether their input was correct, no timer was presented and consequently no performance dependent monetary rewards were given. Both groups underwent the same experimental procedure (see Fig. 4). Preliminary results indicate that users in the Feedback condition try to increase their success by interacting significantly faster than the No-Feedback group at the expense of significantly higher error rates. Furthermore, users from the Feedback condition chose multimodal interaction (33 % of trials) more often compared to the No-Feedback group (29.7 % of trials). Given that the Feedback group earns significantly higher monetary rewards, this difference in interaction behavior proves to be an effective way to increase success under pressure.

Regarding the in-depth analysis of the temporal patterns of interaction, temporal interaction parameters like modality overlaps and individual durations are measured under very different contextual conditions while the task is held fixed. We hypothesize that temporal interaction patterns become shorter when the users are under pressure. This could have implications on the fusion of user inputs and their adaption to context within the same application. Preliminary results suggest that the users do indeed act faster in the pressure condition. To be more specific, the temporal overlap of modalities decreases, while the durations of each modality themselves remain almost the same.

4.3 Cognitive Load: Induction of Overload and Underload

Given that users can be overwhelmed by the options and corresponding operations presented to them, diminishing the users’ satisfaction with the system in general and thus affecting the user-system interaction, we intend to investigate the effects of cognitive load. Based on the present paradigm, an experiment was conducted to induce cognitive overload and underload in the subjects and investigate their effects by analyzing the users’ individual reactions and subjective feedbacks.

The induction of cognitive overload and underload is generated by varying the number of objects and their colors within a task as well as the available given time to solve that task. These variables influence the difficulty of a given task and also affect the user’s interaction with the system. Cognitive overload is induced by increasing the task field objects and colors as well as decreasing the available time, while cognitive underload is induced by decreasing the task field objects and colors and increasing the available time. Figure 5 depicts the two variants and the overall experimental procedure.

The interaction modality used during the induction phase can be either speech or mouse and is defined at the beginning of the experiment. Standardized questionnaires are filled by the subjects prior starting the experiment. Further, various kinds of subjective feedbacks including free speech, emotional rating and direct questions as well as baseline breathing phases are also implemented.

In order to enable an easy-to-handle workflow, the course of events within the experiment as well as the used modalities may be completely managed through an external task set. Within a task set, the workflow setting of the sequences can be defined individually for every task and every subject, allowing a high flexibility and generalization of the course setup.

5 Conclusion

The presented experimental paradigm enables a controlled investigation of the general laws and principles associated with multimodal interaction and the cognitive load of users, while the results are kept generalizable to a vast number of different implementations of multimodal interaction. Based on the paradigm, we presented three implemented setups covering a broad range of research topics. This includes an investigation of the role of users’ previous multimodal interaction experience (their so-called interaction history). The resulting insights into the individuality of multimodal temporal relations will help to improve the fusion of inputs in future systems [11]. The second implementation shows that the influence of contextual parameters like pressure of success and time can be examined by slightly varying the provided feedback. Using fine grained variations of the task’s difficulty, one gains control over the amount of cognitive demand imposed on the users, reaching from underchallenged to well overstrained.

In addition to such flexibility, the presented paradigm has several other advantages over using a specific real-world application for research, such as its easy implementation, the possibility to deploy it on different hardware setups with different modalities, as well as its suitability for lengthy laboratory studies with a lot of repetitions due to its gamified design. Thus, it allows researchers to meet the goal of gaining knowledge of multimodal interaction that is diverse and in-depth, yet still generalizable.

References

Cheong, C., Cheong, F., Filippou, J.: Quick quiz: A gamified approach for enhancing learning. In: PACIS, p. 206 (2013)
Google Scholar
Dey, P., Madhvanath, S., Ranjan, A., Das, S.: An exploration of gesture-speech multimodal patterns for touch interfaces. In: Proceedings of the 3rd International Conference on Human Computer Interaction, IndiaHCI 2011, pp. 79–83. ACM, New York (2011). http://doi.acm.org/10.1145/2407796.2407808
Dumas, B., Ingold, R., Lalanne, D.: Benchmarking fusion engines of multimodal interactive systems. In: Proceedings of the 2009 International Conference on Multimodal Interfaces, ICMI-MLMI 2009, pp. 169–176. ACM, New York (2009)
Google Scholar
Haas, E.C., Pillalamarri, K.S., Stachowiak, C.C., McCullough, G.: Temporal binding of multimodal controls for dynamic map displays: A systems approach. In: Proceedings of the 13th International Conference on Multimodal Interfaces, ICMI 2011, pp. 409–416. ACM, New York (2011). http://doi.acm.org/10.1145/2070481.2070558
Lalanne, D., Nigay, L., Palanque, P., Robinson, P., Vanderdonckt, J., Ladry, J.F.: Fusion engines for multimodal input: a survey. In: Proceedings of the 2009 International Conference on Multimodal Interfaces, ICMI-MLMI 2009, pp. 153–160. ACM, New York (2009). http://doi.acm.org/10.1145/1647314.1647343
Lee, M., Billinghurst, M.: A wizard of oz study for an ar multimodal interface. In: Proceedings of the 10th International Conference on Multimodal Interfaces, ICMI 2008, pp. 249–256. ACM, New York (2008). http://doi.acm.org/10.1145/1452392.1452444
Oviatt, S., Coulston, R., Lunsford, R.: When do we interact multimodally? Cognitive load and multimodal communication patterns. In: Proceedings of the 6th International Conference on Multimodal Interfaces, ICMI 2004, pp. 129–136. ACM, New York (2004)
Google Scholar
Oviatt, S., Coulston, R., Tomko, S., Xiao, B., Lunsford, R., Wesson, M., Carmichael, L.: Toward a theory of organized multimodal integration patterns during human-computer interaction. In: Proceedings of the 5th International Conference on Multimodal Interfaces, ICMI 2003, pp. 44–51. ACM, New York (2003)
Google Scholar
Oviatt, S., Lunsford, R., Coulston, R.: Individual differences in multimodal integration patterns: what are they and why do they exist? In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2005 pp. 241–249. ACM, New York (2005)
Google Scholar
Ryan, R.M., Rigby, C.S., Przybylski, A.: The motivational pull of video games: A self-determination theory approach. Motiv. Emot. 30(4), 344–360 (2006)
Article Google Scholar
Schüssel, F., Honold, F., Weber, M., Schmidt, M., Bubalo, N., Huckauf, A.: Multimodal interaction history and its use in error detection and recovery. In: Proceedings of the 16th ACM International Conference on Multimodal Interaction, ICMI 2014, pp. 164–171. ACM, New York (2014)
Google Scholar
Smith, A.L., Baker, L.: Getting a clue: creating student detectives and dragon slayers in your library. Reference Services Review 39(4), 628–642 (2011)
Article Google Scholar
Turk, M.: Multimodal interaction: A review. Pattern Recogn. Lett. 36, 189–195 (2014)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the Transregional Collaborative Research Center SFB/TRR 62 “Companion-Technology for Cognitive Technical Systems”, which is funded by the German Research Foundation (DFG). It is also supported by a Margarete von Wrangell (MvW) habilitation scholarship funded by the Ministry of Science, Research and the Arts (MWK) of the state of Baden-Württemberg for Dilana Hazer-Rau. Some icons from Fig. 1 were designed by www.Freepik.com.

Author information

Authors and Affiliations

Institute of Media Informatics, Ulm University, Ulm, Germany
Felix Schüssel & Frank Honold
Department of General Psychology, Ulm University, Ulm, Germany
Nikola Bubalo & Anke Huckauf
Section of Medical Psychology, Ulm University, Ulm, Germany
Harald Traue & Dilana Hazer-Rau

Authors

Felix Schüssel
View author publications
You can also search for this author in PubMed Google Scholar
Frank Honold
View author publications
You can also search for this author in PubMed Google Scholar
Nikola Bubalo
View author publications
You can also search for this author in PubMed Google Scholar
Anke Huckauf
View author publications
You can also search for this author in PubMed Google Scholar
Harald Traue
View author publications
You can also search for this author in PubMed Google Scholar
Dilana Hazer-Rau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Felix Schüssel .

Editor information

Editors and Affiliations

The Open University of Japan, Chiba-shi, Chiba, Japan
Masaaki Kurosu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schüssel, F., Honold, F., Bubalo, N., Huckauf, A., Traue, H., Hazer-Rau, D. (2016). In-Depth Analysis of Multimodal Interaction: An Explorative Paradigm. In: Kurosu, M. (eds) Human-Computer Interaction. Interaction Platforms and Techniques. HCI 2016. Lecture Notes in Computer Science(), vol 9732. Springer, Cham. https://doi.org/10.1007/978-3-319-39516-6_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-39516-6_22
Published: 19 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39515-9
Online ISBN: 978-3-319-39516-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

In-Depth Analysis of Multimodal Interaction: An Explorative Paradigm

Abstract

Similar content being viewed by others

Implementation Goals for Multimodal Interfaces in Human-Computer Interaction

Design and Development of Multimodal Applications: A Vision on Key Issues and Methods

Holistic Multimodal Interaction and Design

Keywords

1 Introduction

2 Related Work

3 A Visual Search Task for Empirical Research

4 Planned Research

4.1 A User’s Previous Experience: Individual Interaction Histories

4.2 Contextual Parameters: Pressure of Success and Time

4.3 Cognitive Load: Induction of Overload and Underload

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

In-Depth Analysis of Multimodal Interaction: An Explorative Paradigm

Abstract

Similar content being viewed by others

Implementation Goals for Multimodal Interfaces in Human-Computer Interaction

Design and Development of Multimodal Applications: A Vision on Key Issues and Methods

Holistic Multimodal Interaction and Design

Keywords

1 Introduction

2 Related Work

3 A Visual Search Task for Empirical Research

4 Planned Research

4.1 A User’s Previous Experience: Individual Interaction Histories

4.2 Contextual Parameters: Pressure of Success and Time

4.3 Cognitive Load: Induction of Overload and Underload

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation