Keywords

1 Introduction

1.1 Motivation for Simulating a Real Touchscreen System in Virtual Reality

The digLT is a large table surface screen allowing multi-hand, multi-finger gesture-based touch interaction with the displayed map data and other interactive elements [11, 12, 14, 25]. Virtual reality (VR) systems using headsets have been around for a long time, [20] but recent developments have led to high quality and comparatively cheap virtual reality experiences as well as driven an increased development of new complementing interaction systems, e.g. for the recognition of the hand poses [5]. We replicated the digLT system within a virtual environment using such VR technologies, essentially simulating the current system and its capabilities virtually, which has the potential for some advantages over the existing system:

  • The virtual reality enables users to work collaboratively from remote locations, their virtual presence, or telepresence [16], preserving the important modalities such as gesturing to other users, pointing at map contents, etc., or enable users to participate in situation briefings from afar without entering the workspace.

  • Given the VR systems’ capabilities, display and interaction may in the future not only be replicated but also enhanced by the display of and hands-free interaction with three-dimensional data such as 3D heightmaps, entire three-dimensional building plans or a street view of an area. Also, information displayed in the room would not be bound to the location of hardware screens but could be put anywhere in the environment according to a user’s choice and preference or according to user-evaluated efficiency-optimized layouts. Information needed by only a single user might be omitted from the view of other users.

  • The simulation of arbitrary environments could increase the users’ acceptance of the workplace, and possibly eliminate distractions posed by nearby objects which are not part of the purpose of the workspace.

  • The hardware of the current digLT system is comparatively large, heavy and expensive. Replacing the large touchscreen with a virtual reality system comprised of a head mounted device (HMD) and a couple of small, lightweight sensors would increase the viability of the system for users who need lower cost, an increased mobility or quick setup of the system.

1.2 Objectives for Implementation of the VR System

To evaluate the viability of a VR system for the direct replacement of the digLT situation table, the table and its fundamental map interaction concepts are replicated in a virtual environment, using a tracked HMD display and hand recognition.

A VR system that may simulate all relevant interaction needs to put the user into a virtual environment in an immersive way, ideally mimicking reality completely. To preserve a user’s perception and apply the same interaction concepts as used in the real system, it must take into account that the interaction with the system is largely body centered, regarding gestural and touch interaction with the screen, but also cooperation with other users [11]. As [18] argues, the user’s ability to interact in such a way relies fundamentally on his own proprioception, an internal, mental model of his own body. Ideally, this model should be supported with the same virtually generated or passively provided sensory cues that come with having a body and interacting in reality. We implement a VR system as an exploratory prototype towards these ideal aims. It enables the user to see his own hands and feel objects touched. A digLT surface dummy is placed in co-location with a virtual representation of the digLT to passively provide haptic feedback (See Fig. 1).

Fig. 1.
figure 1

Usage of the ‘real’ digLT system and its VRLT simulation with a cloth-covered haptic surface dummy

2 Related Work

Hand Interaction with the Digital Situation Table (digLT). [6] describes the advantages of a direct hand-based interaction concept with displays in contrast to interaction in which a user needs to manipulate additional devices. The evaluation of optically well discriminable hand poses leading to a hand gesture interaction set used for the digLT system, which is robust against unintended interaction, is described. Stationary camera systems for optical hand recognition are used.

Interaction with VR Systems. [5] describes current low-cost consumer VR systems, providing a taxonomy of current hardware developments, listing input and output devices by their capabilities.

As a possible low-cost component of a room-scaled VR system, the Microsoft Kinect system has been used to demonstrate full “smooth” hand tracking including finger movement [22], but of course depends on the visibility of the hand from the sensor’s perspective.

Using free-hand interaction in a head-tracking enabled VR system, [15] find that users prefer gestural interaction with objects over a controller-based approach. [7] use free-hand interaction with 3D-modelled objects visualized in a head-tracked VR system, finding that users found the interaction simple and natural, although the models displayed could not be touched. While not providing full force feedback for touched objects, [19] describes a free-hand system using hand tracking and providing tactile feedback enabling the user to differentiate between different “textures” of virtually touched objects by shooting air vortices at a user’s skin at the point of interaction.

3 Implementation

3.1 Integration of Elements into the Virtual Environment

For an overview, please refer to Fig. 2: On the left, entities and interaction in the real world are shown, in the middle, the method of transporting entities and interactions from and to the virtual environment are shown, and on the right side, the virtually simulated entities and interactions are shown. The digLT’s surface and display capabilities are grayed out since only a digLT dummy is placed in the real world, providing a physical surface with identical dimensions for its virtually co-located VRLT representation. Hand movement and thus touch interaction is transported via the hand tracking device and hands are displayed in the virtual environment, the hand interaction is detected within the virtual environment, and a corresponding touch signal sent to the IVIG map software. The software’s map display is shown on the VRLT model’s surface. The user perceives and moves his field of view through the virtual environment via an HMD, which is head tracked.

Fig. 2.
figure 2

Transport of interaction and perception concepts from reality to virtual reality.

Mimicking the digLT optically: the VRLT. The entire project is built within a 3D room environment set up and rendered within the game engine Unity. The environment chosen to give the user a general frame of reference (floor, walls, and ceiling of a futuristic corridor and the lighting of the environment) was taken from an example scene provided for the game engine [4]. To mimic the user’s real-world digLT experience in VR, a 3D model representing the digLT optically was built for and placed in the virtual environment. This virtual situation table (VRLT) model’s surface layer was programmed to display the IVIG situation table software by copying the current display of the computer running the whole system to the surface layer texture.

Due to early availability of the system and the high quality of its technological capabilities, an HTC Vive HMD system [2] was used to provide for display and navigation through the virtual environment. The Vive is sold commercially and designed for end user comfort, uses low latency head tracking for navigation through the virtual environment and provides a 110\(^{\circ }\) field of view updated with 90 frames per second [24]. Thus, the Vive provides a maximally natural and efficient means of navigation. The Vive also provides a fundamental safety function of displaying boundaries of the virtually traversable world where physical obstacles in the real world are located.

Mimicking the digLT haptically: calibration of real-world positions. The space in which the user is tracked by the Vive system is placed arbitrarily, approximately in the middle of a virtual room. The SteamVR Unity plugin [1] automatically co-locates the real floor with the virtual floor and thus provides the Vive’s tracking coordinate system within the virtual environment. (The virtual floor is calibrated during the Vive system’s guided SteamVR setup).

While visually colocating one’s hands with a virtual screen in mid-air seems to be the most obvious method to control virtual screens, this would be, in turn, impractical for converting the existing touchscreen system to VR. As [10] notes, “Physically touching virtual objects using tactile augmentation enhances the realism of virtual environments”. We position a real table surface in place of the rendering of the digLT displayed in the virtual world, in a mixed-reality kind of setup. With the table surface physically touchable, but otherwise devoid of function, the virtual screen still displays a reaction to the touch interaction recorded in the virtual world so that the user gets the impression of a real touchscreen. The more believable the touchscreen resembles a real touchscreen, the less a user will have to think about how to transport his previous experience with touchscreens to the virtual system. As [17] notes, among other advantages of the touchscreen, “touching a visual display of choices requires little thinking and is a form of direct manipulation that is easy to learn”. As has been shown by [13], including haptics in a virtual environment significantly increases presence. The physical surface also provides the user with a natural physical frame of reference for angling hands and fingers. Thus, gesture recognition can be mimicked as closely as possible to the existing touchscreen’s gesture set. In respect to possible future conversions of touchscreen systems to VR, this is, of course, to be desired, as the conception and validation of a gesture set are not trivial (an example for the digLT gesture set conception can be found in [6], p. 60). Furthermore, the user will be less fatigued, since the surface provides rest for arms and fingers so they need not be held up constantly. The dimension of movement is reduced to lifting the arms and letting them down onto the surface, instead of having to use force to stop hand movement in the position of a merely virtual screen. As [21] found, the colocation of virtually seen objects with physical feedback also increases users performance when interacting with them. Lastly, since the recognition of intended touch interaction with the virtual screen is not actually done by touch recognition, but rather by a measurement of the user’s hand positioning within a small interaction area above the screen surface, the movement to remove the hands from the interaction area can not be distinguished from movement intended for interaction. If the user lifts a hand ‘upwards’ and out of the interaction area, if the movement is not perfectly perpendicular to the screen surface, the hand may also unwantedly be moved ‘sidewards’, resulting in an unwanted sidewards interaction with the virtual screen. Providing a physical surface reduces the vertical layer size of the interaction area in which the user’s hands can move and thus reduces the potential for such unwanted interaction.

Thus, the real digLT system’s surface covered with a cloth was used to provide the passive haptic representation, a touchable dummy, of the virtual screen in the real world. Using the Vive system provided the possibility to use the included controllers to calibrate the virtual table’s position to match the dummy’s position. The controllers’ virtual models match their real world dimensions.

For calibration, the VRLT model’s surface position is virtually attached to the bounds of a controller. The VRLT is placed in the approximate position of the dummy by laying this controller upon the dummy’s surface. Then, the second controller can be used to test the VRLT’s position by holding it to the dummy’s physical bounds. If the VRLT’s position is off, its bounds can iteratively be matched to the digLT’s bounds by moving the controller lying upon the real digLT and repeated probing. As soon as the surfaces are satisfyingly co-located, the virtual position is locked in.

Displaying a user’s hands virtually: interaction with the VRLT. To provide for object manipulation, a Leap Motion controller [3] for the detection and display of the users’ hands in the Virtual environment was used, providing a most direct correspondence between the physical and virtual world, practically enabling the user to interact with the virtual environment and objects naturally, as he may interact with them in the real world. The low latency and robustness of tracking, both of which have been shown to be at a reasonably high quality [23, 27], provide the user with a visible hand. This may increase the sense of proprioception for his virtual body parts and enables him to have “direct contact” with virtual objects, increasing his sense of Presence.

The hands are visualized by virtual hand models following the user’s tracked movements within the virtual world. By attaching the device to the front of the HMD as specified, the Leap Motion Orion plugin for Unity infers the origin of the LeapMotion’s coordinate system from the position of the headset, thus no further calibration is needed. The virtual hands’ position and gesture were then used to trigger interactions upon the VRLT when the user would virtually bring his hands down upon the VRLT’s surface, by sending appropriate signals to the IVIG software. Since both the user’s hands and their virtual counterparts, as well as the VRLT and the physical table’s surface, are co-located, this could be perceived by the user much like the touch of a touch-screen. The interaction was detected within a physical margin of tolerance, a thin interactive layer or ‘interaction area’ above the virtual table’s surface, because of the expected, but minor imprecision of both tracking systems. The physical surface also naturally prompts the user to hold his hands in poses more plainly visible to the Leap Motion system, which aids the device’s capacity to detect the finger poses for the intended set of gestures.

Since the imprecisions of the Leap Motion sensor seem to largely depend on the distance of the user’s hands from his body, this systematic error was offset by recalibrating the VRLT’s height for each user until touching the VRLT with the virtual hands would coincide with touching the physical table’s surface with the real hands. Offsetting the point of touch seen in the virtual world to the point of touch felt in the physical world intends to ‘fool’ the user into truly perceiving the virtual representation of his hands as his own, proprioceptively [26].

As an additional visual cue for the user, the position of the users’ index fingers’ fingertips was projected perpendicularly onto the table surface, where a thin vertical bar extending from the table surface to the top of the interaction layer was displayed. Also, when interaction was detected, a floating text was displayed in a position above his hands, displaying the name of the type of interaction currently detected (See Fig. 3).

Fig. 3.
figure 3

User performing the map movement gesture on the VRLT. Left: VRLT and user’s virtual hands in virtual environment. Right: Photographic image taken from the HTC Vive’s internal camera.

Interaction Gestures. Three chosen fundamental interactions and their corresponding gesturing were implemented to mimic the digLT’s interaction concept as closely as possible. Gestures were differentiated simply by discretely deciding if a finger is stretched out. The state of the finger was derived from the “finger angles”, an estimate of the degree of bending of the metacarpophalangeal joints provided by the LeapMotion.

A gesture was then detected by comparing the list of currently stretched out fingers with a list of fingers that have to be stretched out for a certain gesture.

To move the map, the user puts one hand flat, with all fingers stretched out (as opposed to bent to the position they would have when the user makes a fist), onto the table’s surface. The hand with all fingers stretched out is recognized by the system as the map movement gesture. Upon putting the hand within the interaction area, ‘touching’ it to the virtual table, the appropriate signals are sent to the IVIG software to make the map follow the hand’s movement, keeping the map below the user’s hand. Upon removing the hand from the surface and out of the interaction area, the interaction stops and the map is at rest again.

To initiate the zoom interaction, the original digLT’s zoom gesture had to be replaced. The original zoom gesture consisted of the index and middle fingers of both hands stretched out and tilted away from each other, which was frequently tracked improperly, as either only one finger stretched out or all non-thumb fingers stretched out. Thus, the zoom interaction gesture for the VRLT was replaced by having both hands’ fingers stretched out in the same way as for the move gesture. Upon entering the interaction area with two hands with stretched out fingers the IVIG software would zoom the map in if the user would move his hands closer to each other and zoom out when he would move them away from each other. As with the original IVIG software, the map would zoom in and out on its center. Removing either of the hands from the interaction area would stop interaction.

Lastly, to initiate the rotation interaction, the user places one hand with stretched out fingers as in the movement gesture and one hand with only the index finger stretched out and the other fingers clenched into a fist into the interaction area. Upon moving the hand with the stretched out finger, the map is rotated around an estimated point in the middle of the hand with the stretched out fingers by the angle that the finger is moved around the hand center relatively to its previous position. Removing the hands from the interaction area stops the movement.

The hand’s recognized movement was interpolated and large jumps, indicating tracking errors, were ignored.

4 Evaluation

4.1 Hypotheses and Questions

To assess the viability of the VR system, digLT and VRLT systems were compared in multiple categories for their usability as perceived by the participants of the user study we conducted. The study covers the capability to interact with and view the map data in an efficient way, with the digLT being used as a baseline that the VR system is compared to. The fundamental hypothesis tested in each category was thus generally a variation of “H0: Users do not perceive a difference between aspects of the systems”, which were tested against their alternatives, “H1: Users perceive a difference between aspects of the systems”.

As a central question of this evaluation, the study’s participants were asked if the system enabled them to fulfill an interaction task sets using each of the systems, as detailed below.

To further assess the quality of interaction, the study participants were asked about the differences they perceived in workload factors (as assessed by the NASA TLX subscales) as well as precision of interaction and functioning of system components.

4.2 Procedure

Preparation. Before every evaluation session, the system was set up and calibrated as described in Implementation, Sect. 3. The IVIG map software was setup in the same starting state both on the machine running the virtual environment and the real digLT to display a map overlayed with information element icons (see Fig. 4).

Fig. 4.
figure 4

A typical map overlaid with information symbols, provided by the IVIG software.

An evaluation session consisted of seven phases, a demonstration phase, a questionnaire phase concerning demographic information, a familiarization and explanation phase, a task execution phase on one system, a questionnaire phase concerning the previous phase, a task execution phase for the second system, and a questionnaire phase concerning the second system and a comparison of the systems, as detailed below.

Demonstration. Upon arrival, study participants were given a short explanation of the purpose and interaction capabilities of both systems in random order by the study’s conductor, who curtly demonstrated their function.

Questionnaires. Following demonstration, participants were asked to begin answering a questionnaire. All questionnaires were presented on a computer using a computer mouse and a keyboard. This phase’s questionnaire concerned their physiological and demographic data.

Familiarization and explanation. The participants were then asked to familiarize themselves with the systems by operating them themselves. The study’s conductor explained the system’s function and taught users how to optimally utilize the system for interacting with the map.

Afterwards, an explanation of the questions that would follow each of the two following task phases was given, closely following the NASA TLX instruction manual [9]. Notably, the study’s conductor asked the participants to emphasize on trying to record and compare their experience of usability of the systems as precisely as possible, keeping in mind how they answered for the first tested system.

The interaction and questionnaire phases following were randomized in order to even out or omit effects of fatigue and task familiarity interfering with the evaluation.

Task set execution on first system. The study conductor then gave the participants a task set made of four stages with multiple short tasks to move, zoom and rotate the map and finally use all interaction gestures in combined tasks. In a single task, the participants were asked to update the display of the map (e.g., ‘Move the information elements displayed to the upper left corner of the display area’ as one of the movement tasks during the first stage). The participants were told that their execution of the task was not going to be objectively rated since the tasks’ purpose was to give the participants an impression of usability. Participants indicated their completion of a task by their own assessment. To give the participants a feeling of how much time they needed to complete a task, the conductor would stop them after thirty seconds, offering once to reset the map for a single task.

Questionnaire for first system. Using the questionnaire, the participants were asked to now rate their experience, keeping in mind that the purpose of the study was a comparison between the two systems, emphasizing on the positive or negative intensity of their perceptions.

Repetition for second system and final questionnaire. After finishing the questionnaire, the second task execution phase would be started off and after the tasks’ execution, the second questionnaire was done, including a final part that let the participants compare the systems to each other directly.

4.3 Participants and Demographics

The study took place in Germany, and all of the study’s 28 participants were German, speaking German as their first language. 22 were male, 6 female, almost all having achieved general qualification for university entrance/being students, 9 having a university degree, 16 coming from a MINT/STEM background.

10 participants were 170–179 cm tall, 13 gave their size as 180–189 cm, 3 participants were smaller than 169 cm, 2 were taller than 190 cm.

Participants were mainly young adults and middle-aged adults, 12 aged 18–24, 10 aged 25–34, 5 aged 25–64 and 1 aged older than 75. By own assessment, none of the participants reported extreme physical or mental unfitness, extreme susceptibility to getting sick or dizzy easily, nor “any sort of physical, mental or health impairment that might have influence on the usage of a large screen, hand-based interaction with a screen, or wearing a headbound headset that covers the eyes”, except for sight impairments.

15 participants reported sight impairments (mostly short-sightedness), 5 of whom explicitly stated after familiarization that they preferred to evaluate the systems keeping their glasses on (including under the HMD), and 3 stating to prefer not using glasses (reporting being only mildly seeing impaired).

One participant reported having only one eye, but tested the VR system and indicated having a realistic experience. One other user had to remove his varifocals during evaluation, since it required him to change head pose to see sharply, which disabled him from using the system with his glasses on.

All of the participants reported at least a bit of experience and knowledge about technology and accessories as well as previous experience with touchscreens, 7 of the participants reported previous experience with head-mounted VR systems and 7 reported previous experience with optical body pose recognition systems.

4.4 Statistical Analysis Method

All answers were given on seven-point Likert-scales for comparative questions and nine-point Likert-scales for the TLX subscale questions, exceptions are noted. To analyze Likert-type data, i.e. qualitative ordinal scales, non-parametric methods are used.

Results are visualized as boxplots (showing median, range of values and quartiles of the collected data) overlaid with swarmplots showing all data points. To derive qualitative impressions of user tendency for the systems, the differences between the answers given on the nine point scales for the TLX subscales and the seven-point scales for all other questions are shown.

Differences are calculated by subtracting the answers for the VR system from those for the digLT system. Positive values can give the intuition of being “in favor of the VR system”, negative answers “in favor of the digLT system”.

Note that a “magnitude of difference” between the systems cannot be absolutely inferred from these differences since the ordinal, non-metric scaling does not provide a “fixed unit” of strength of impression across evaluation participants. E.g., two participants rating the difference between the systems “very large” and “very slight” on the frustration scale, corresponding to 4 vs. 1 point of difference in rating, may perceive the difference as equally large, but their a priori interpretation of the frustration scale may differ in its “range” or their total tolerance for frustration, depending on personal predisposition.

The significance of differences between answers for both systems is analyzed by employing the nonparametric Wilcoxon signed rank test for matched pairs [8]. The test statistic is the Z-value, analyzed for significance resulting in a p-value, which will signal significance of the H0 hypotheses when p < 0.05. As often done in other publications, the results were also analyzed analogously using t-testing, giving t- and p- values which were also tested for p < 0.05; this showed statistical significance of the differences for the same categories as the aforementioned Wilcoxon test and will thus not be reported separately.

4.5 Results and Interpretation

In this section, for each of the evaluated categories, the results, the statistical analysis and the interpretation of results are presented.

Task Fulfillment. Answers to the question, “Did the system enable you to fulfill the tasks given to you?”. Answer options were “Yes” or “No”, answering was required. Out of the 28 participants, all answered “Yes” for the digLT system. For the VRLT system, 26 participants answered “Yes”, two answered “No”. Barring very specific cases, the VR system enabled the users to use the VR system in a way that resembled the digLT system closely. Depending on circumstances, the system is a viable alternative to the classical system.

Differences in NASA TLX Subscales. Using the statistical analysis on the results from Fig. 5, Table 1 shows that the hypothesis “H0: The participant’s impression of [NASA TLX Subscale] is equal on both systems” is accepted for Mental Demand, Physical Demand, Temporal Demand and Frustration. It is rejected and alternative H1: “The participant’s impression of [NASA TLX Subscale] differs on both systems” accepted for Performance and Effort.

Fig. 5.
figure 5

NASA TLX subscales

Table 1. Statistical analysis – Wilcoxon signed rank test for matched pairs – TLX subscales

The difference is significant and the effect size showing a large decrease of participants’ performance for the VR system as compared to the digLT system, while the significant difference for effort shows a negligible effect size for effort differences for the VR system. For the other factors of workload, the perceived differences between VR and digLT systems are not statistically significant.

Differences in Precision of Interaction. Using the statistical analysis on the results from Fig. 6, Table 2 shows that the hypothesis “H0: Users perceive the precision of interaction as equal” is accepted only for Panning Precision, and rejected in favor of the alternative hypothesis “H1: Users perceive the precision of interaction as inequal” for Zooming Precision, Rotation Precision and Combined Tasks Precision. The significant differences for the zoom and rotation interaction precisions have a small and moderate effect size showing decreased precision for the VR system; consequently, the effect size for the combined tasks precision was in between.

Fig. 6.
figure 6

Rating differences of perceived precision per task category

Table 2. Statistical analysis – Wilcoxon signed rank test for matched pairs – Precision

Differences in Functioning. Using the statistical analysis on the results from Fig. 7, Table 3 shows that the hypothesis “H0: Users perceive the functioning of the systems as equal” is rejected for all functioning categories but graphic display in favor its alternative hypothesis “H1: Users perceive the functioning of the systems as inequal”. The differences between the functioning of the systems are significantly perceptible, mostly palpable in subcategories, as the difference in the functioning of the system as a whole shows a weak effect size. The strongest effect size shows for recognition of hand movement, showing a large proneness to erroneous behavior for the VRLT system. Recognition of performed gesture has a small effect size showing a performance worse on the VRLT with a small degree. Transfer of input to the map however seems to even function a little better on the VRLT, showing a small effect size.

Fig. 7.
figure 7

Rating differences of perceived function per system component.

Table 3. Statistical analysis – Wilcoxon signed rank test for matched pairs – Functioning

4.6 Participant Comments

The participants were very excited for the innovation of the system, praising the potential of the VR system’s capabilities. When participants were asked to imagine long-term use of the system, they perceive hand interaction imprecisions and detection errors as well as HMD encumbrance and image quality as the most detrimental technical hurdles.

5 Conclusion

The digLT situation table, a large touch-screen, and its map interaction concepts were replicated in a virtual environment, as directly as possible. Within the environment, the user can see his own hands and physically touch the virtual table. The table provides passive haptic feedback, enabling a proprioceptively guided, body centered interaction. The VR system is evaluated to compare the usability and with the aim of finding necessary improvements toward a system that is perceived as well as the real system.

The evaluation shows that the sensors and systems used, provide the general ability to manipulate maps using the VR system almost on par with the conventional system.

While participants find long-term use of the system problematic because of drawbacks in interaction and encumbrance, VR systems have the liking and excitement of users, who see large promises for increase of usefulness if implementations are optimized and the content displayed makes more imaginative use of the capabilities of VR technologies.

For the VR system proposed, the most directly possible optimizations may include an improved hand detection by fusion of data from the different sensors and the implementation of a more sophisticated existing hand pose detection method. An update of the interaction concept, using a more modern form of gestural interaction with the map, such as keeping the ‘touched’ points on the map below the user’s hands and scaling, rotating and panning the map accordingly, should be evaluated in further studies. As an alternative to optical hand tracking, the use of gloves or tracked controllers may serve as suitable substitutes. The display of the user’s own and other users’ full body avatars or similar representations of the users’ bodies may increase the perceived immersion in the environment and enable cooperation on the system, e.g. via telepresence. Lastly, the display of map data optimized for visibility in the HMD system and the use of three-dimensional map data is a viable improvement.

Using a VR system to simulate the digLT is a cost-effective alternative compared to the existing digLT system and gives the user the ability to deploy an environment providing the digLT capabilities with a much-increased mobility, as well as providing the potential to include functionality unrealizable with the actual touchscreen system.