Keywords

1 Introduction

Multimodal interaction has been a topic of research for some while now. There has been a lot of progress concerning how to model and process multimodal inputs. Still, little is known about the generic principles that apply, e.g. the choice of modalities, the temporal relations of multimodal inputs, and what may be an even more important factor, the contextual parameters that influence multimodal interaction. To tackle these questions, we have designed an abstract, but still generic, experimental paradigm that allows the exploration of these questions in a flexible but controlled manner. Based on the developed paradigm, individual applications are generated and applied in different experimental setups.

2 Related Work

There have been a number of approaches on how to design multimodal interfaces and on how to model multimodal inputs from a system’s perspective (see [5] for an overview). A more generic perspective on multimodal interaction is examined by Turk [13]. Two of the open challenges stated therein are a thorough understanding of the issues relating to the cognitive load of users, and the development of better guidance and best practices for the design and evaluation of multimodal systems (ibid.). Tackling these challenges requires empirical evidence. Accordingly, there has been a lot of empirical research in multimodal interaction, mostly specific to a certain domain, including map interactions [4, 79], augmented reality [6], image manipulation [2], and music players [3].

Comparing the results of these studies reveals considerable differences. Although the domain and tasks in the work of Oviatt et al. [79] and Haas et al. [4] are quite similar, their results are in parts contradictory. While the former reports on users predominantly showing a simultaneous use of modalities, the latter reports on no users showing a simultaneous use of modalities. Similarly, the dependency on task difficulty remains ambiguous. The findings of [2, 6] are even more specific to their respective domains. Although these provide some insights, their generalizability and transferability to other applications seems doubtful. Dumas et al. take a broader perspective and present a test bed for the evaluation of fusion engines using a music player as example [3]. They conclude that more work is necessary on fusion engine’s adaption to context (i.e. environment and applications), as well as usage patterns and repetitive errors. This shows, that basic research on universal principles, which govern common tasks found in many applications, is still rare.

One aspect of the context is the influence of time pressure and the pressure of success onto the interaction behavior of a user. Getting the right ticket at the ticket vending machine in the train station last minute before the train leaves would be an example for such a situation. Including game elements to the study enables the simulation of such pressures on the user in laboratory settings. Respective gamification methods include feedback [1] on success and time pressure as well as a reward system [12]. These elevate both the intrinsic and extrinsic motivation of the user to complete the given tasks as reasoned by [10] based on the self determination theory.

3 A Visual Search Task for Empirical Research

In search of a task that is common to many different applications, we can identify operations on objects as a joint characteristic. Figure 1 shows different application examples.

Fig. 1.
figure 1

Different applications allowing operations on presented objects.

These kinds of tasks are found throughout many applications and are thus chosen for our research. Empirical research poses additional requirements as well, e.g. tasks must be performed repeatedly without becoming routine or dull, and participants’ motivation must be kept high throughout the course of an experiment. In order to remedy these issues, we chose to use a gamified version of the task. In matters of the domain the tasks should take place in, we decided to use abstract representations of objects and operations.

Our solution is a visual search task, where the user has to identify the visually unique object and then specify its location and color (as a replacement for an arbitrary operation). Figure 2 shows a screenshot of the game. In the central area of Fig. 2, objects with differing shapes and colors are presented. In the given example, the green rectangle on position 3 is the unique object to be spotted by the user. The expected input can be provided either by using exclusive touch, exclusive speech, exclusive mouse or a combination of those modalities (e.g. touching the object and naming its color or vice versa).

Fig. 2.
figure 2

Screenshot of the game that serves as abstract replacement of operations on objects found in many applications. The user has to spot the single unique object and designate it’s location and color. In the above screenshot, the unique object is the red triangle. (Color figure online)

4 Planned Research

The generic design of our setup enables the investigation of isolated factors such as the users previous experience, contextual parameters, and cognitive demand. The following sections provide further details on how the presented experimental paradigm can easily be adjusted to facilitate the respective research. Although they are based on the same paradigm, different setups are used for each focus of research. Where applicable, first results are presented as well.

4.1 A User’s Previous Experience: Individual Interaction Histories

In order to investigate the influence of individual user-centered interaction histories, the experimental paradigm is applied as shown in Fig. 3. The applied modalities are speech and touch inputs in any possible multimodal combination as described in Sect. 3. The inclusion of an induction phase, which requires users to solve the tasks applying only one of the possible four modality combinations, enables the investigation of the influence of individual interaction histories in the free interaction phase. We are particularly interested in the modality preferences of the free interaction phase, depending on the induced modality combination. Is there a favorite modality combination (regarding to error rates) and how long does it take users to apply it when induced otherwise? This could provide insights on the learning behavior of multimodal inputs.

Fig. 3.
figure 3

The experimental procedure to investigate individual interaction histories. The induction phase induces a certain modality combination for each subject. In the free interaction phase, users can perform inputs in any modality combination.

Additionally, this experimental setup allows for an in-depth analysis of temporal relations of multimodal inputs, particularly with regard to the contradicting findings of the related work concerning the predominance of simultaneous and sequential interaction patterns. Results of a user study with this setup are reported in [11]. It is shown that a classification into simultaneous and sequential users may not be feasible in general. Instead, a more differentiated inspection of individual behavior is proposed and possible uses are discussed (cf. [11]).

4.2 Contextual Parameters: Pressure of Success and Time

Regarding contextual parameters, we investigate the influence of pressure to succeed in the task and time pressure, by varying the reward system and the available time to complete a task. These factors are supposed to have a significant influence not only on the error rate, but also on the way people interact with a system. To this end, an experiment was conducted in which we compared two groups of subjects which differed in the amount of auditory and visual feedback given by the system as well as the monetary reward given for the participation. Contrary to the Feedback group, the No-Feedback group got no auditory or visual feedback whether their input was correct, no timer was presented and consequently no performance dependent monetary rewards were given. Both groups underwent the same experimental procedure (see Fig. 4). Preliminary results indicate that users in the Feedback condition try to increase their success by interacting significantly faster than the No-Feedback group at the expense of significantly higher error rates. Furthermore, users from the Feedback condition chose multimodal interaction (33 % of trials) more often compared to the No-Feedback group (29.7 % of trials). Given that the Feedback group earns significantly higher monetary rewards, this difference in interaction behavior proves to be an effective way to increase success under pressure.

Fig. 4.
figure 4

The experimental procedure to investigate the effects of time pressure and pressure to succeed. One group is set under pressure (Feedback), while the other is not (No-Feedback). In the induction phase, each subject is restricted to use specific modalities. In the free interaction phase, users can perform inputs using any modalities. (Color figure online)

Regarding the in-depth analysis of the temporal patterns of interaction, temporal interaction parameters like modality overlaps and individual durations are measured under very different contextual conditions while the task is held fixed. We hypothesize that temporal interaction patterns become shorter when the users are under pressure. This could have implications on the fusion of user inputs and their adaption to context within the same application. Preliminary results suggest that the users do indeed act faster in the pressure condition. To be more specific, the temporal overlap of modalities decreases, while the durations of each modality themselves remain almost the same.

4.3 Cognitive Load: Induction of Overload and Underload

Given that users can be overwhelmed by the options and corresponding operations presented to them, diminishing the users’ satisfaction with the system in general and thus affecting the user-system interaction, we intend to investigate the effects of cognitive load. Based on the present paradigm, an experiment was conducted to induce cognitive overload and underload in the subjects and investigate their effects by analyzing the users’ individual reactions and subjective feedbacks.

The induction of cognitive overload and underload is generated by varying the number of objects and their colors within a task as well as the available given time to solve that task. These variables influence the difficulty of a given task and also affect the user’s interaction with the system. Cognitive overload is induced by increasing the task field objects and colors as well as decreasing the available time, while cognitive underload is induced by decreasing the task field objects and colors and increasing the available time. Figure 5 depicts the two variants and the overall experimental procedure.

Fig. 5.
figure 5

The experimental procedure to induce and investigate the effects of cognitive load. Both groups (Overload and Underload) undergo the same procedure, while cognitive load is increased by increasing the task field objects and colors as well as decreasing the time. The modality during the induction phase is set up at the beginning of the experiment. (Color figure online)

The interaction modality used during the induction phase can be either speech or mouse and is defined at the beginning of the experiment. Standardized questionnaires are filled by the subjects prior starting the experiment. Further, various kinds of subjective feedbacks including free speech, emotional rating and direct questions as well as baseline breathing phases are also implemented.

In order to enable an easy-to-handle workflow, the course of events within the experiment as well as the used modalities may be completely managed through an external task set. Within a task set, the workflow setting of the sequences can be defined individually for every task and every subject, allowing a high flexibility and generalization of the course setup.

5 Conclusion

The presented experimental paradigm enables a controlled investigation of the general laws and principles associated with multimodal interaction and the cognitive load of users, while the results are kept generalizable to a vast number of different implementations of multimodal interaction. Based on the paradigm, we presented three implemented setups covering a broad range of research topics. This includes an investigation of the role of users’ previous multimodal interaction experience (their so-called interaction history). The resulting insights into the individuality of multimodal temporal relations will help to improve the fusion of inputs in future systems [11]. The second implementation shows that the influence of contextual parameters like pressure of success and time can be examined by slightly varying the provided feedback. Using fine grained variations of the task’s difficulty, one gains control over the amount of cognitive demand imposed on the users, reaching from underchallenged to well overstrained.

In addition to such flexibility, the presented paradigm has several other advantages over using a specific real-world application for research, such as its easy implementation, the possibility to deploy it on different hardware setups with different modalities, as well as its suitability for lengthy laboratory studies with a lot of repetitions due to its gamified design. Thus, it allows researchers to meet the goal of gaining knowledge of multimodal interaction that is diverse and in-depth, yet still generalizable.