1 Introduction

Interaction designers often strive to design realistic and natural interactions when developing VR applications because naturalness has been associated with increased usability and improved user performance [1]. However, when working with VR systems with limited capabilities, designers often resort to creating or using semi-natural interaction techniques. While many designers may intuitively believe that these techniques should afford better user performances than non-natural techniques, we present evidence in this paper that suggests otherwise.

First, we must clearly distinguish the differences among natural, semi-natural, and non-natural interaction techniques. To do so, we use the concept of interaction fidelity—the objective degree of exactness with which real world actions are reproduced in an interactive system [2]. Additionally, we present an updated version of the Framework for Interaction Fidelity Analysis (FIFA) [3], which can be used to assess the degree of interaction fidelity provided by a technique compared to a real-world action. FIFA consists of three broad categories of fidelity aspects. First, biomechanical symmetry is used to analyze the anthropometrics, kinematics, and kinetics of an interaction technique. Second, the input veracity of a technique concerns the accuracy, precision, and latency of the input devices used to implement it. Finally, control symmetry represents the transfer functions that translate input data into meaningful system functions and simulation outcomes.

In this paper, we present a series of case studies in which we used FIFA to analyze the degrees of interaction fidelity provided by different techniques in prior research studies. Across the varying case studies, we found two common results. First, high-fidelity interaction techniques usually outperform mid-fidelity techniques. This was not surprising given the intuitive notion that usability improves with more naturalness. However, the second common result was that low-fidelity interaction techniques also usually outperform mid-fidelity techniques. This result directly contradicts the intuition that more naturalness is good. Additionally, it indicates that increasing interaction fidelity may produce a U-shaped curve in terms of user performance.

Considering these results, we propose that interaction fidelity is similar to the uncanny valley phenomenon regarding aesthetics in robots [4]. In general, we hypothesize that increasing interaction fidelity from traditional, non-natural techniques will initially result in worse user performances. As interaction fidelity continues to increase, and the overall degree of fidelity becomes relatively high, user performances will rebound and be comparable, if not better, than those afforded by the low-fidelity techniques. We discuss potential reasons for this phenomenon and conclude that familiarity is the most likely cause. We also present a new guideline concerning the design of mid-fidelity and semi-natural interaction techniques.

2 Framework for Interaction Fidelity Analysis (FIFA)

FIFA was originally introduced by McMahan [3]. Since then, it has been used by researchers to analyze the interaction fidelity of various techniques (e.g., [5]). Recently, we have updated the framework to account for nuances that the original version did not account for and to make it easier to use. We present that updated version here.

FIFA consists of three major categories of components—biomechanical symmetry, input veracity, and control symmetry. Each corresponds to one of the aspects of the User-System Loop, as seen in Fig. 1. In addition to these three categories, the new version also emphasizes the importance of three phases of interaction. Below, we define the three categories, their components, and the three phases of interaction.

Fig. 1.
figure 1

The User-System Loop and the three categories of FIFA

2.1 Biomechanical Symmetry

Biomechanical symmetry is the objective degree of exactness with which real-world body movements for a task are reproduced during interaction. It consists of three subcomponents. First, anthropometric symmetry is the objective degree of exactness with which body segments involved in a real-world task are required by an interaction technique. Second, kinematic symmetry is the objective degree of exactness with which a body motion for a real-world task is reproduced during an interaction technique. Third, kinetic symmetry is the objective degree of exactness with which the forces involved in a real-world action are reproduced during an interaction technique.

For an example, consider the March-and-Reach technique [6], in which the user uses both hands and both feet to climb a virtual ladder. March-and-Reach has a high degree of anthropometric symmetry to real-world ladder climbing since both involve using the hands, forearms, upper arms, thighs, legs, and feet. Additionally, the interaction technique has a high level of kinematic symmetry because it reproduces most of the arm and leg motions involved with climbing a real ladder. However, it has a low level of kinetic symmetry, as it lacks the tactile and force feedback stimuli involved with grabbing and stepping on actual ladder rungs. Considering all three subcomponents, March-and-Reach has a moderately high degree of biomechanical symmetry.

2.2 Input Veracity

Input veracity is the objective degree of exactness with which the input devices capture and measure the user’s actions. It also consists of three subcomponents. First, accuracy refers to how close an input device’s readings are to the “true” values that it attempts to measure. Second, precision concerns a device’s ability to reproduce the same results when repeated measures are taken in the same conditions. Finally, latency is defined as the temporal delay between user input and the sensory feedback generated by the system in response to it.

Input veracity depends solely on the quality of the input devices and is independent of the user’s actions. Consider a Vicon motion capture system for an example. Most Vicon systems offer sub-millimeter accuracy, sub-millimeter precision, and latencies of a few milliseconds. Hence, these systems provide a high degree of input veracity. On the other hand, some tracking devices do not offer the same quality of input data, such as the Microsoft Kinect. Therefore, such devices provide less input veracity.

2.3 Control Symmetry

Control symmetry is the objective degree of exactness with which control in a real-world task is provided by an interaction. Control primarily concerns how the user’s actions translate into system effects. It depends on only one component—transfer function symmetry, which is the objective degree of exactness with which a real-world transfer function is reproduced through interaction. Because most real-world actions do not actually involve a transfer function, we treat the effects of the user’s actions on the real world as output properties of theoretical transfer functions that take the user’s actions as input properties.

As an example of control symmetry, consider the simple virtual hand technique, which directly maps the data of a six-degree-of-freedom (6-DOF) handheld tracker to the position and orientation of a virtual handheld object, in a one-to-one fashion [7]. Compared to manipulating a real-world handheld object, the simple virtual hand has a high degree of transfer function symmetry and therefore provides a high level of control symmetry. On the other hand, the ray-casting technique, which manipulates virtual objects at an offset due to the ray [7], provides a low degree of control symmetry.

2.4 Interaction Phases

In the updated version of FIFA, we have adopted the practice of analyzing techniques across three phases of interactions: (1) the initiation phase, (2) the continuation phase, and (3) the termination phase. The initiation phase includes all of the biomechanical, input, and control aspects required to begin using a technique. The continuation phase encompasses all of the aspects involved with continuing the interaction. Finally, the termination phase indicates what is required to stop the interaction.

Consider the Walking In Place (WIP) technique [8]. During the initiation phase, the user starts the travel technique by lifting and stepping in place with one leg. The continuation phase involves continuously alternating steps in place to continue virtual travel. Finally, the termination phase only requires the user to stop stepping in place.

3 Case Studies of Interaction Fidelity

Using FIFA, we have conducted several case studies to investigate the effects of interaction fidelity on user performance in prior research studies. We have found two common results. First, we have found the intuitive result that those techniques with high degrees of interaction fidelity in biomechanical symmetry, input veracity, and control symmetry, generally afford better user performances than mid-fidelity interaction techniques that are lower in one of the three categories. However, our second result is not as intuitive. We have also found that low-fidelity techniques, with little interaction fidelity in any of the three categories, also generally afford better user performances then the mid-fidelity techniques. Together, these two results indicate that moderate levels of interaction fidelity often yield the worst user performances and that increasing interaction fidelity produces a U-shaped curve. We present three subsets of case studies below that exemplify this phenomenon.

3.1 Moderate Levels Worse Than High Levels

The following case studies provide results that indicate mid-fidelity interaction techniques are worse than high-fidelity techniques in terms of user performance. Hence, these case studies demonstrate that user performance increases as interaction fidelity increases from moderate to high levels.

For Manipulation Tasks.

Mine et al. [9] conducted a study comparing three 6-DOF virtual hand techniques for a 3D manipulation task. One of the three techniques was a simple virtual hand technique with a direct, one-to-one mapping. However, the other two techniques used transfer functions that created offsets between the physical handheld tracker and the user’s virtual hand position.

The two offset techniques afforded only moderate levels of interaction fidelity. Both provided high degrees of biomechanical symmetry with high anthropometric, kinematic, and kinetic symmetries to the real-world action of manipulating a handheld object. However, both only had moderate control symmetry due to their transfer functions intentionally producing offsets between the physical tracker and virtual hand. On the other hand, the simple virtual hand was a high-fidelity interaction technique with high degrees of both biomechanical symmetry and control symmetry, with its one-to-one transfer function directly mapping the virtual hand to the physical tracker.

During the study, Mine et al. [9] found that the high-fidelity technique afforded significantly faster completion times for the task of manipulating a virtual cube and docking it (i.e., positioning and aligning it with) another target cube. In a similar study, Ware and Rose [10] also found that the high-fidelity simple virtual hand outperformed a mid-fidelity offset technique. Therefore, both of these studies indicate that high-fidelity interaction techniques afford better user performances than those with moderate levels of interaction fidelity.

For Search Tasks.

Pausch et al. [11] conducted a study comparing two view control techniques for the task of visually searching a room for a target. One of the techniques simply relied on mapping the 6-DOF head tracker of a head-mounted display (HMD) to the user’s viewpoint. However, the second technique involved placing a 6-DOF tracker within a handheld object and using the same one-to-one transfer function to translate the tracker’s position and orientation into the user’s viewpoint. Additionally, for the handheld technique, Pausch et al. mounted the HMD to the ceiling to prevent users from attempting to physically look around.

The handheld view control was a mid-fidelity interaction technique. It essentially afforded no biomechanical symmetry, as users would move their hands to control their viewpoints instead of moving their heads. However, its control symmetry was relatively high as it used a direct, one-to-one mapping from the tracker’s input to the viewpoint’s position. On the other hand, the head-tracked technique afforded high levels of interaction fidelity as it afforded both biomechanical symmetry and control symmetry to the real-world action of looking around to visually spot a target.

During the study, Pausch et al. [11] asked users to sit and visually search a surrounding virtual environment for a target letter and verbally inform the experimenter when they found the target or determined that it did not exist. The researchers found that the high-fidelity head-tracked technique afforded faster completion times for non-present targets than the mid-fidelity handheld technique. Hence, the results of this study also indicate that high-fidelity interaction techniques provide better user performances than mid-fidelity techniques with moderate levels of interaction fidelity.

3.2 Moderate Levels Worse Than Low Levels

The following case studies provide results that indicate mid-fidelity interaction techniques are worse than low-fidelity techniques in terms of user performance. Hence, these case studies demonstrate that user performance can decrease as interaction fidelity increases from low levels to moderate levels.

For Steering Tasks.

McMahan et al. [12] conducted a study comparing four steering techniques for driving a virtual vehicle in the Mario Kart Wii game for the Nintendo Wii. Two of the techniques employed traditional joystick controls for racing games and only differed in terms of the form factor of the game controller. One used the Wii Classic controller while the other used the GameCube controller. The remaining two steering techniques used the orientation reported by a Wii Remote to determine the steering direction. Their only difference was also form factor, with one technique using only the Wii Remote and the other using the Wii Wheel prop.

The two joystick techniques yielded low levels of interaction fidelity. Because the user’s thumb performed the steering actions, all three components of biomechanical symmetry were low for both techniques compared to using a real-world steering wheel to drive a vehicle. Additionally, their levels of control symmetry were low due to using a force-to-direction transfer function instead of the real-world position-to-direction function that steering wheels employ.

The Wii Remote and Wii Wheel techniques only provided moderate degrees of interaction fidelity to using a real-world steering wheel. Because the user could change the steering direction by manipulating the orientation of the relevant handheld device, both techniques yielded high degrees of anthropometric and kinematic symmetry. However, the kinetic symmetries of these techniques were low due to users having to exert additional forces to hold and re-center the devices in the absence of a powered steering column. The control symmetries of the techniques were low due to also using a force-to-direction transfer function to translate the Wii Remote’s accelerometer data into a steering direction. Additionally, McMahan et al. [12] questioned the latency of the accelerometer-driven orientation tracking, which indicates that the techniques may have had only moderate input veracity compared to a real-world steering wheel.

In the within-subject study, McMahan et al. [12] found that the low-fidelity joystick techniques afforded significantly faster driving times and significantly fewer driving mistakes (i.e., running off course) than the two mid-fidelity interaction techniques. Hence, these results support our findings that low-fidelity techniques provide better user performances than mid-fidelity interactions.

For Navigation Tasks.

McMahan [3] conducted a study comparing two travel techniques for navigating a path from one location to another throughout a large virtual environment. The first travel technique was similar to traditional keyboard-and-mouse controls for first-person desktop games. The second travel technique was the Human Joystick [2], which uses the center of the tracking area and the user’s tracked head position to define a 2D horizontal vector. This 2D vector defines the direction of travel and its magnitude controls the speed. To avoid constant virtual locomotion, the Human Joystick employs a small no-travel zone at the center of the tracking area.

The keyboard-and-mouse technique obviously provides only a low level of interaction fidelity. It lacks biomechanical symmetry due to not incorporating the user’s thighs, legs, and feet. Additionally, its control symmetry is low due to using a velocity-based transfer function, instead of the position-based function of real walking. On the other hand, the Human Joystick offers a moderate degree of interaction fidelity. It has high anthropometric symmetry to real-world walking. During the initiation phase, when stepping outside of the no-travel zone, it also affords high kinematic and kinetic symmetries. Though these are low during the continuation and termination phases. The Human Joystick also provides moderate control symmetry due to employing two transfer functions. In the no-travel zone, a one-to-one, position-to-position function offers a high degree of transfer function symmetry. However, outside of the zone, another position-to-velocity function provides little control symmetry.

McMahan [3] varied the two travel techniques within-subject during his study. The results of his study showed that the low-fidelity keyboard-and-mouse technique afforded significantly faster travel times than the mid-fidelity Human Joystick technique, which supports our hypothesis that mid-fidelity techniques are generally worse than low-fidelity techniques, in terms of user performance.

3.3 Moderate Levels Are the Worst

The following case studies provide results that indicate mid-fidelity techniques are worse than both low-fidelity and high-fidelity interaction techniques. These case studies support our theory that mid-fidelity techniques are the worst and that interaction fidelity is a U-shaped curve with user performance first decreasing and then increasing as interaction fidelity increases from low to high levels.

For Manipulation Tasks.

Zhai and Milgram [13] conducted a study comparing four interaction techniques for a 3D manipulation and docking task. The first technique was an isometric rate approach, in which force applied to a 6-DOF Spaceball directly controlled the direction and speed of the object’s velocity. The second technique was an isometric position technique that allowed the user to directly manipulate the object’s position by moving the Spaceball and using a button for clutching. The third interaction was an isotonic rate technique, in which the user manipulated the velocity of the object by freely moving a 6-DOF glove. Finally, the fourth technique was an isotonic position approach that allowed the user to directly manipulate the position of the object by freely moving the glove device.

The isometric rate technique yielded the lowest level of interaction fidelity among the four techniques. Its anthropometric symmetry to manipulating a handheld object in the real world was low due to not incorporating the forearm or upper arm. As such, its kinematic and kinetic symmetries were also low. Additionally, its control symmetry was low due to its force-to-velocity transfer function.

The isometric position and the isotonic rate techniques were both moderate in terms of interaction fidelity. Like the isometric rate technique, the isometric position interaction was low in biomechanical symmetry, but its position-to-position transfer function provided it with greater control symmetry. On the other hand, the isotonic rate technique had a high degree of biomechanical symmetry due to the ability to freely move the glove device, but its control symmetry was low due to a velocity-based transfer function similar to the isometric rate technique’s function. The isotonic position technique was the only high-fidelity interaction with both high biomechanical symmetry and high control symmetry to the real-world task.

For the task of manipulating and docking a virtual pyramid with another target pyramid, Zhai and Milgram [13] found that the low-fidelity isometric rate technique and the high-fidelity isotonic position technique were nearly identical in terms of task completion times. However, they found that both mid-fidelity interaction techniques were significantly worse than the low and high-fidelity techniques. They also determined that the mid-fidelity isometric position technique yielded the worst task completion times. Hence, the results of this study reinforce our theory that mid-fidelity interaction techniques are generally worse than both low and high-fidelity techniques.

For Navigation Tasks.

In a more recent study, Nabiyouni et al. [5] compared three travel techniques for a path navigation task. The first travel technique was based on the traditional gamepad interface for first-person console games, in which one joystick controls translations while the other joystick controls view orientation. The second travel technique was the Virtusphere technique, which allows the user to physically walk in any direction within a large “hamster ball” sphere mounted on casters. For the third technique, Nabiyouni et al. [5] used the real walking technique, in which a 6-DOF head tracker updates the user’s view as if walking through the real world.

The joystick-based technique used in the study was a low-fidelity interaction technique. It afforded no biomechanical symmetry to real-world walking and little control symmetry, as its transfer function mapped the forces applied to the joysticks to the direction and speed of the user’s velocity. The Virtusphere technique provided more interaction fidelity due to higher degrees of anthropometric and kinematic symmetries. The kinetic symmetry of the Virtusphere was low though due to extra forces being required to start and stop the rolling of the large sphere. Additionally, its transfer function symmetry was low due to translating the sphere’s velocity into virtual movements. The real walking technique afforded the highest degree of interaction fidelity with high levels of biomechanical symmetry and control symmetry.

For the navigation task of walking along a path, Nabiyouni et al. [5] found that the mid-fidelity Virtusphere technique resulted in the worst user performances for path deviations and task completion times. Both the low-fidelity joystick technique and the high-fidelity real walking technique outperformed the Virtusphere technique. Hence, this case study further supports the theory that mid-fidelity interaction techniques are generally the worst and that interaction fidelity yields a U-shaped curve of user performance as it increases from low levels to high levels.

For Search Tasks.

In another recent study, Pal et al. [14] compared three travel techniques for visual and navigation-based search tasks with an HMD. The first travel technique was gaze-directed steering, in which users used a handheld device to activate movements relative to the current gaze direction of the HMD. The second technique was similar to real walking for translational movements, but required users to use the handheld device to activate virtual turning within the environment. The third technique was real walking, which did not require any use of the handheld device.

The gaze-directed steering was a low-fidelity interaction technique. It lacked biomechanical symmetry as users primarily used their fingers instead of their thighs, legs, and feet to move. Additionally, its control symmetry was low due to its transfer function translating button presses into velocity through the virtual environment.

The virtual turning technique afforded only a moderate level of interaction fidelity. Its biomechanical symmetry to real-world walking was high due to using the thighs, legs, and feet to move around, though the inability to physically turn did lower the overall kinematic symmetry. With regard to control symmetry, the technique had a high degree of transfer function symmetry with its one-to-one, position-to-position mapping. However, the control symmetry was not entirely high, as the handheld device buttons controlled the forward direction of virtual movements. The real walking technique afforded the highest degree of interaction fidelity, as it did in the prior case study of Nabiyouni et al. [5].

For both visual search tasks and navigation-based search tasks, Pal et al. [14] found that the mid-fidelity virtual turning technique yielded the worst user performances in terms of search times and errors. Both the low-fidelity gaze-based steering and the high-fidelity real walking outperformed the mid-fidelity technique. These results further reinforce our theory that moderate levels of interaction fidelity provide the worst user performances and that interaction fidelity is a U-shaped curve.

4 Discussion

4.1 Interaction Fidelity Is an Uncanny Valley

In robotics, the term “uncanny valley” represents the phenomenon that, after a certain point, as a robot’s human likeness increases, familiarity with and empathy toward the robot decreases, unless human likeness is at a very high level [4]. At high levels, reports of familiarity and empathy rebound and increase to levels higher than any previous degree of human likeness. We see a similar U-shaped curve with regard to the effects of interaction fidelity on user performance, as seen in our case studies. Hence, interaction fidelity appears to be the uncanny valley of VR interactions.

However, it is important not to overgeneralize our theory that moderate levels of interaction fidelity yield the worst user performances. There are obviously exceptions to this phenomenon. For example, mid-fidelity interaction techniques are not always worse than their low-fidelity counterparts. Consider the study conducted by Peck et al. [15], in which they compared a low-fidelity joystick technique, a mid-fidelity Walking In Place technique, and a high-fidelity redirected walking technique. For their navigation and wayfinding tasks, Peck et al. [15] found that the high-fidelity travel technique significantly outperformed both the low and mid-fidelity techniques. However, they did not find any significant differences between the low-fidelity joystick technique and the mid-fidelity WIP technique. Hence, the interaction fidelity uncanny valley is likely applicable to many subsets of interaction techniques, but not necessarily every subset or technique.

4.2 Potential Causes of the Interaction Fidelity Uncanny Valley

In addition to recognizing the phenomenon, it is important to understand what might be the cause of the interaction fidelity uncanny valley. In discussing the implications of their study, Nabiyouni et al. [5] suggested that the phenomenon is due to users attempting to employ mid-fidelity interaction techniques in the same manner as high-fidelity ones and the brain needing to adapt to the non-natural parts of the mid-fidelity techniques. However, if this were the case, we could expect to see a consistent and linear decrease in user performance for every component of interaction fidelity that a mid-fidelity technique differed from its high-fidelity counterpart. Also, this does not explain why many low-fidelity techniques outperform their mid-fidelity counterparts.

Instead, we believe the cause of the interaction fidelity uncanny valley is primarily due to a lack of familiarity. High-fidelity interaction techniques benefit from being similar to real-world actions and therefore are familiar to users. This is why they are often referred to as natural interactions. On the opposite end of the spectrum, many low-fidelity interaction techniques are similar to preexisting computer interfaces, such as keyboard-and-mouse techniques, joystick-based techniques, and game-controller techniques. These types of interfaces are familiar to many users nowadays, especially those that play videogames. Hence, like high-fidelity interaction techniques, low-fidelity interfaces also benefit from being familiar to users.

However, most mid-fidelity interaction techniques do not benefit from familiarity amongst users. Most interfaces that offer moderate levels of fidelity are not similar to preexisting computer interfaces or real-world actions. Hence, users cannot leverage prior experiences and familiarity with these techniques to achieve better user performances. When they can leverage such experiences, we have seen that users perform better with mid-fidelity techniques than predicted by the uncanny valley. For example, in the study conducted by Peck et al. [15], we believe that the lack of significant difference between the joystick and WIP techniques was due to most of the users being familiar with marching in place, which is the basis of the mid-fidelity WIP technique. As further evidence of the importance of familiarity, Lai et al. [6] found that users with many real-world experiences climbing ladders performed better on a virtual ladder climbing task using the moderately high (but not high) fidelity March-and-Reach technique than users without such real-world ladder climbing experiences.

Furthermore, it is important to note that familiarity with distinct components of a mid-fidelity interaction technique may not be enough to ensure better user performances. Consider the study conducted by Pal et al. [14]. The mid-fidelity virtual turning technique was essentially the combination of a low-fidelity joystick technique and a high-fidelity real walking technique. However, the researchers found that the mid-fidelity technique yielded the worst user performances. Therefore, we hypothesize that a user’s familiarity with a whole interaction technique, and not just its components, is necessary for better user performances.

4.3 Guideline for Interaction Designers

Given the uncanny valley phenomenon related to interaction fidelity, and our hypothesis that the cause of the U-shaped curve is due to a lack of familiarity, we suggest that interaction designers should avoid developing mid-fidelity, semi-natural interaction techniques that lack overall similarities to well-established user interfaces or common real-world actions. At the surface, this may seem limiting. We are not suggesting that interaction designers should stick to low-fidelity techniques that are similar to other computer interfaces or high-fidelity techniques that mimic real-world actions. We are, however, suggesting that designers should start at one of these two ends of the uncanny valley and think of ways to maintain familiarity with users while designing new techniques with moderate levels of interaction fidelity.

5 Conclusions and Future Work

We have discussed the concept of interaction fidelity and why it is important to interaction designers when creating natural and semi-natural interaction techniques. We also presented the latest version of FIFA, a framework for analyzing the level of interaction fidelity that a technique provides when compared to a real-world action. Using FIFA, we have presented several case studies that demonstrate the effects of interaction fidelity on user performance. We have repeatedly provided evidence that moderate levels of interaction fidelity can result in the worst user performances. We refer to this U-shaped phenomenon as the uncanny valley of VR, which appears to be caused by a lack of familiarity with whole interaction techniques. Given this, we have offered a new design guideline concerning mid-fidelity interaction techniques.

For future work, we plan to conduct a series of studies that will provide more empirical evidence of interaction fidelity’s uncanny valley and hopefully insights into the effects of familiarity and how designers can overcome system limitations to deliver semi-natural techniques that are more effective than their low-fidelity counterparts.