1 Introduction

In contrast to more conventional human-machine interfaces, which simply transfer operator commands to a technical system or provide visual or auditory information to the operator, haptic interfaces generate stimuli (like mechanical forces) allowing tactile or kinesthetic sensations. Haptic information is crucial when exploring or manipulating objects in remote or virtual environments. For environments which are difficult to access or too hazardous for humans (e.g. nuclear plants, deep sea), so-called teleoperation systems can be used in which a robot is acting as an “extended arm”, remotely controlled by the human operator. The great advantage of teleoperation is that the human operates from a safe location while human skills, attention, problem solving capabilities etc. can be extended to the remote location [53, 64]. In a similar vein, instead of interacting with a physically remote environment, it is also possible to use haptic interfaces for virtual reality (VR) applications (like training simulators, virtual assembly verification etc.), with computer-generated sensory feedback.

One major precondition to take full advantage of teleoperation or VR systems is a bidirectional exchange of haptic information between operator and the remote or virtual environment, enabling the operator to perceive collisions, contact forces, weight, object shapes, surfaces textures, etc.

The main benefits of providing haptic information in physically remote or virtual scenes are manifold [23]. Firstly, there is a natural interaction with the environment, similar to real-world experiences. This also allows for a higher degree of immersion and an improved sense of (tele)presence; i.e. the operator’s subjective impression of being physically present in the remote or virtual environment [53]. Secondly, compared to systems with visual information only, providing additional haptic information improves spatial awareness of the remote or virtual scene. Constantly updated haptic information allows for a better understanding of the movement and positions of the end effector (e.g. robotic hand) or the manipulated objects (e.g. a tool) in the remote/ virtual environment. Force feedback even allows implementing physical constraints (like virtual fixtures) and avoids exaggerated force application. Also, the operator is better able to generate an egocentric frame of reference, i.e. position and orientation of the teleoperated end effectors or objects are specified in relation to the operator. Finally, using the haptic channel not only matches real world experiences, it also decreases the operator’s cognitive load when visual resources are restricted.

In the last decades, numerous haptic devices for teleoperation and VR systems were developed and have been used successfully in a wide range of applications. In telesurgery, for instance, an operation is performed inside the patient’s body with the instruments being controlled by the surgeon via remotely controlled robotic arms. Here, haptic feedback about forces acting on the instrument’s tips (e.g. when palpating tissue or when pulling a thread during surgical knotting or suturing) is crucial. In a recent meta-analysis on 21 studies [62], the positive overall effect of providing (kinesthetic) force feedback for surgical applications was documented for task accuracy and force regulation.

Moreover, Nitsch and Färber [36] performed a meta-analysis on the effects of haptic feedback on teleoperation performance in general including 32 studies, mainly investigating telerobotic tasks like moving a mobile robot and basic manipulation tasks like pick-and-place, peg-in-hole, and grasping (except for five studies with surgical tasks). The authors reported significant positive effects of haptic feedback on task success and completion times compared to conditions without haptic feedback. Although both meta-analyses provide clear evidence for the benefits of haptic feedback in a broad variety of applications and experimental tasks, the influence of the specific task demands on the magnitude of these effects has not been investigated so far [36].

In the present paper, we report the results of different meta-analyses investigating the effect of force feedback on task performance in (tele-)surgical and other teleoperation systems controlling real robotic systems as well as VR simulators. The main objective is the assessment of the overall effect of haptic feedback when comparing telesurgical tasks (like suturing or knotting) with common teleoperation tasks (like assembly, peg-in-hole). Moreover the effect of substituting force feedback by other modalities (like vibrotactile information) during tele-manipulation tasks was explored.

2 Methods

Sample of Studies. We conducted a literature search using different library databases (PubMed, IEEE Xplore, ScienceDirect, Springer.com, Web of Science). Additionally, we used Google Scholar, to seek further references not identified in the formal scan procedure. Different combinations of keywords [teleoperation OR telerobotics OR telesurgery OR virtual reality OR simulation] AND [haptics OR force feedback OR force OR tactile OR sensory substitution] were used. Next, reference lists of the identified articles were checked to find additional related studies. Moreover, researchers were contacted and asked for unpublished papers, dissertations, diploma or master`s theses on this topic. Altogether, 128 primary studies were collected.

Criteria for Study Inclusion. Next, the following inclusion criteria were applied: (1) direct empirical comparison of conditions with and without haptic feedback (or sensory substitutes) for the same experimental task and with the same input/ output devices, (2) no focus on haptic training effects, (3) sufficient information to determine effect size estimates, (4) telerobotic systems or virtual reality simulations, (5) methodological control of time effects (learning, fatigue) and, (6) original publication. After application of the inclusion criteria, a sample of 58 studies (27 journal articles, 24 conference papers, 6 doctoral or master’s theses and one book article; 37 studies with general telemanipulation and 21 with telesurgical tasks) with a total of N = 1104 subjects remained. The studies included in the current meta-analysis are identified in the reference listing of the current paper by an asterisk.

Calculation of Effect Sizes. The effect of haptic feedback on performance was calculated comparing the mean difference between conditions with and without feedback in standardized by the pooled standard deviation s (i.e., Cohen’s d [11], see Formula 1).

$$ d = \frac{M_{NoFeedback} - M_{Feedback}}{s} $$
(1)

For a more conservative estimation of the effect sizes in case of small sample sizes, we calculated Hedges’ g by multiplying d with a correction factor J (g = d * J; see Formula 2, df is the degrees of freedom; df = n 1  + n 2 -2 for two independent groups, for instance, see [6]).

$$ J = 1 - \frac{3}{4df - 1} $$
(2)

When the information required for effect size calculation was missing, effect sizes were estimated on basis of p or t statistics reported in the studies. Conventionally, effect sizes from 0.2 to 0.5 are considered as small, from 0.5 to 0.8 as medium and from 0.8 to infinity as large effects [11].

For a more fine-grained analysis and to obtain an adequate number of analysis units, we calculated effect sizes from different experimental conditions of each study and different outcome variables. As main measures for task performance, task-specific criteria for task success (like number of successful trials, avoided collisions), accuracy (like penetration depths, optimal path deviations) and detection rates (during surgical palpation tasks) were aggregated. In addition, the average and peak forces applied during task completion were analyzed. Finally, we explored whether the use of additional haptic feedback has an impact on task completion times. Most of the studies only reported a subset of these outcome variables or only one of them. In sum, k = 171 no haptics vs. haptics comparisons were available.

Effect Size Integration. As a preparation of effect size integration across studies, their reliability, i.e. the study’s variance, was taken into account. Each effect size was weighted by the study’s inverse-variance W (W = 1/ s 2, [25]). After aggregation of a class of effect sizes, a mean weighted effect size was computed, and heterogeneity within the class of the k effect sizes was tested with the Q statistics. Q is defined as the sum of squared differences between each study (i) effect size (Y i ) and the mean effect size (M) weighted by the inverse-variance (W i ) of that study (see Formula 3). A significant difference indicates that the aggregated effect sizes do not share a common effect size, but that there are e.g. further moderating factors causing heterogeneity.

$$ Q = \sum\limits_{i = 1}^{k} {Wi(Y} i - M)^{2} $$
(3)

After integration of a class of effect sizes, the impact of potential moderators (like task type) was tested by ANOVAs (fixed effect categorical model; [25]), resulting in between class effects Q b and a within class effect Q w (see [6]) All analyses were performed using the CMA© software package (version 2.2; Biostat).

3 Results

In a first step of analysis, the overall effects of (kinesthetic) force feedback (FF) were computed. Indeed, force feedback significantly improved task success (g = .75) and accuracy (g = .69), detection rates during palpation (g = .62), significantly reduced the average and the peak forces applied during the task (g = .78 or g = .64, respectively) and decreased the time to complete the task (g = .22; see Table 1). Yet, a significant amount of heterogeneity was found for all aggregation classes, indicating the potential influence of moderator variables.

Table 1. Overall effects of force feedback

In a subsequent analysis, we compared the effects of force feedback (FF) reported above with vibrotactile feedback (VT). Indeed, results indicated that task success significantly differed for both feedback modalities, with only a small mean effect size when substituting force feedback by vibrotactile feedback (g VT  = .21 vs. g FF  = .75, Q b  = 34.2; p < .001; see Table 2). Moreover, moderation effects were evident for average force and completion times (Q b  = 29.3; p < .001 and Q b  = 4.8; p < .05): Providing vibrotactile feedback did not have any substantial positive effect. Yet, no evidence for a moderation effect was found for peak forces with similar moderate effect sizes for both modalities (g VT  = .60 vs. g FF  = .64). No moderation analysis was performed on the task accuracy variable, since only two primary studies using vibrotactile feedback could be identified.

Table 2. Moderating influence of feedback modality (Force vs. Vibrotactile Feedback)

Finally, we conducted a moderator analyses, comparing effects of force feedback for surgical tasks (like e.g. suturing) and simple teleoperation tasks (like e.g. peg-in-hole). Results show that the positive effects of FF on peak force reduction was only evident during surgical tasks (g Surgical = 1.06 vs. g Non-Surgical = −0.40; see Table 3). The negative value for non-surgical tasks indicates that peak forces were even higher with force feedback compared to conditions without. Furthermore, moderator analysis revealed that the positive effect of force feedback on completion times is restricted to non-surgical teleoperation tasks (g Surgical = −0.05 vs. g Non-Surgical = 0.29). Please note that no moderation analyses could be performed for task success and detection rates, because the former variable was only used for non-surgical tasks and the latter for surgical palpation tasks only.

Table 3. Moderating influence of task domain on the force feedback effects

One possible explanation for the significant moderation effects on peak force is that during surgical tasks critical force thresholds of course play a much more important role. All surgical studies reporting peak forces used tasks in which force regulation is important or with a critical force level (breaking threads, damaging tissue asf.), while this was not the case in any of the non-surgical studies. Re-analyzing data by classifying studies along the criterion whether there was a critical force level or not, we found a similar moderation effect for the peak force variable (g Threshold = 1.06*** vs. g NoThreshold = 0.19; Q b  = 22.7***).

Moreover, we further explored the moderation effect regarding the required completion times. As discussed in [62] the stronger effect of additional force feedback during basic telemanipulation tasks might be due to the less demanding task character compared to the surgical tasks. In the current sample of surgical studies, manipulation tasks like dissection, suturing, knotting and needle insertion were mainly performed (90 % of the studies). The non-surgical tasks also included simple target acquisition, selection or navigation/ tracing tasks, besides classical (tele-)manipulation tasks like peg-in-hole, pick-and-place and assembly (55 % of the studies). Interestingly, we found a significant moderation effect of task type (Q b  = 28.9***), when categorizing studies reporting completion times into manipulation tasks (g = .09), navigation or tracing tasks (g = .21), selection tasks (g = .47***) and target acquisition tasks (g = .74***).

4 Discussion

Altogether, the quantitative review based on 58 studies with 1104 subjects investigating the impact of force feedback provides strong evidence for the benefits of haptics in a large variety of experimental tasks and different performance dimensions. There are substantial positive effects of additional force feedback on task performance, force regulation and a small positive effect on the task completion times. Evidently, providing force information is indispensable to maintain high performance levels during teleoperation or VR simulations. An alternative to displaying force feedback is vibrotactile feedback. In contrast to force feedback systems, vibrotactile devices are less expensive, lighter, and provide larger workspaces. Besides, tactile feedback provides passive responses (no forces are applied actively). Therefore, there is no conflict between feedback and the user’s sense of position and less muscular fatigue [10]. However, no realistic contact forces are available and there are no kinesthetic constraints avoiding inadequate force production and supporting the operator by forcing her/him into the correct orientation or position e.g. during assembly tasks [32]. In line with this notion, the results of our meta-analysis revealed no reduction of the average force application and compared to force feedback a significantly lower – but still existent and significant – effect of vibrotactile feedback regarding task performance. Similar to force feedback, vibrotactile force information also helps avoiding exaggerated force levels.

Yet, substituting force feedback with vibrotactile stimuli is cognitively more demanding because the kinesthetic events have to be inferred from tactile signals. Consistently, we did not find a time saving effect when providing this kind of feedback. Altogether, vibrotactile devices could be a reasonable alternative if high resolution of haptic information is not critical or as a warning function (e.g. damage or collision avoidance). Still, force feedback is indispensable to improve task performance during tele-manipulative tasks (like assembly tasks or suturing), requiring multi-dimensional (e.g. three-dimensional force and torque information) and high-resolution haptic information.

Next, we explored the effects of different task characteristics on the force feedback effects. Integrating the findings of two meta-analyses, one with a focus on general teleoperation tasks [35], one on surgical application [62] and several additional studies, we compared findings for surgical vs. non-surgical tasks. During the more complex and delicate telesurgical tasks, force feedback is crucial to adjust the input forces adequately and to avoid exaggerated forces (e.g. damaging tissue, breaking threads). Meta-analytical moderation analysis showed that a large positive effect of force feedback occurs for the surgical tasks but even a negative effect for other teleoperation tasks, which is mainly due to the fact that there are usually no critical force thresholds for these tasks. Finally, we did not find a significant reduction of task completion times during surgical tasks, but for the other teleoperation tasks. Subsequent analyses provided evidence that this effect can be explained by the higher complexity of surgical tasks. During (tele-)manipulation tasks, additional force information might be used in an explorative manner, to better understand the spatial configuration. Also, more complex visual and haptic information has to be processed and integrated, resulting in higher cognitive requirements. Analogously, for simple one or two-dimensional selection or target acquisition tasks, significant time saving effects occur.

As one major limitation of the current meta-analysis (and the cited prior meta-analyses), is the remaining amount of heterogeneity in almost all moderation analyses. Evidently, the numerous haptic devices, qualities of force feedback, different remote systems, visualizations [62], experimental tasks, task performance operationalizations, experience levels of subjects, and so forth are the main reason for the variability of results.