Keywords

1 Introduction

As humans have become accustomed to a variety of outdoor navigation solutions in their daily lives, we are recognizing an increase in indoor navigation needs as well. People want to find specific rooms, meeting places or other target locations in unknown buildings, e.g. airports or stations, which confront them with the risk of getting lost or wasting time while searching for the right path to their destination. The Global Positioning System (GPS) network conventionally performs a localization task in outdoor environments and is not suitable for direct use in indoor scenarios and nor it is able to provide reliable information regarding the user’s orientation. By virtue of a construction-related low signal quality and multipath propagation as well as poor satellite positions, GPS can not provide an indoor localization accurate enough to justify its adoption for navigation within typical indoor scenarios [1]. To achieve the ability to navigate inside buildings, we need to find a substitute for such a localization system and provide the needed information with technologies well-suited to meet the requirements of indoor fields of operation.

There is vast research on mobile robot navigation and localization where the key problem that is solved is simultaneous localization and mapping (SLAM). SLAM approaches attempt to create a map of unknown areas while simultaneously tracking the current position or pose of the considered system relative to the permanently extended and updated map [2]. To solve this problem a wide range of different sensor technologies like laser [3], sonar [4], radar [5, 6] or a variety of cameras [7, 8] can be used to acquire the required features of the confronted environment. Especially for camera-based technologies, various methods, algorithms and frameworks to perform visual SLAM (VSLAM) have started to become available and are used extensively in a wide range of robotic applications [9]. VSLAM and related technologies therefore represent a way to realize the needed localization in the targeted areas combined with the generation of usable maps with a suitable level of detail. In contrast to other common signal-based indoor positioning methods [10], most VSLAM approaches are able to estimate poses with six degrees of freedom (6DoF), i.e. position and orientation in 3D space. This knowledge and additionally movement can be used to solve even more accurately positioning tasks. Thus this allows for navigating the user not only to the dedicated position, but also a targeted orientation, if required. With the availability of digital maps and localization methods, VSLAM systems provide the opportunity to derive conceivable paths or trajectories to accomplish a multitude of different navigation tasks. In addition to SLAM-technologies, the robotics community offers many different approaches to solve the challenge of such a final path and motion planning step [11]. In order to create an entire navigation system for human utilization, a proper interface is needed. We would like to expand the target audience of our system, with a specific emphasis on including users with visual impairments. People with partial or complete vision loss rely on versatile assistive technologies like the blind cane or electronic devices [12] to support their daily lives. In order to provide a navigation system for the visually impaired, custom solutions specific to their special needs are needed. Challenges arise primarily in the two-way communication interface between the visually impaired user and the technical system. Due to the deficit in visual perception, well-known and easy-to-use interfaces like touchscreens may be difficult to use in such scenarios. As such, there is a necessity to use alternative interface technologies to offer a meaningful user experience. These circumstances lead to a rethinking of the information processing modality and presentation. One example is guidance by non-intrusive, intuitively understandable tactile signals. In our case, we employ a belt with 32 vibromotors placed equidistantly around the waist. Previous research has shown that tactile direction signals provided around the waist in a 360 degree manner can instantly be integrated in behavior and may even circumvent the bottleneck of attention [1316]. Our research work aims to ensure the basic applicability of a VSLAM based navigation system prototype to a spectrum of users ranging from those with normal vision to those with complete visual impairment by using an alternate information interface.

2 Related Work

The here presented project is firmly based on basic approaches and methods of mapping, navigation and VSLAM used by mobile robots. It is beyond the scope of this paper to grant a wide and comprehensive overview on this spacious matter and the breadth of basic technologies that it encompasses. We thereby refer the interested reader to available survey publications on this topic [9, 1719].

A focused survey on implemented navigation systems reveals present works bearing similarities to our proposal. Shoval et al. presented a waist-worn belt called Navbelt, which was also based on robotic concepts and is intended to be used as a computerized travel aid [20]. However, in contrast to our project, the belt does not serve as a feedback element, but rather relies on an ultrasonic sensory system to acquire the real world data and mainly uses auditory feedback to display information regarding the surroundings.

Wachaja et al. mounted two laser scanners on a walking frame and also use a vibro-tactile belt with angularly-aligned actuators [21]. Like Zöllner et al. [22] or Cardin et al. [23], they used the belt to encode distances to detected obstacles rather than for navigation purposes, which clearly makes it different from our work.

Borenstein and Ulrich developed an active blind cane [24] based on multiple sonar sensors. The sensory systems have been placed inside a cane construction with a set of tires which is controllable via servo drives. Thus, the cane also offers active feedback to lead the user through the environment and avoid obstacles which is the system’s major task.

Schwarze et al. [25] presented a stereo vision-based navigation system with head-mounted image sensors combined with an inertial measurement unit (IMU) and acoustic feedback. The system has been developed for outdoor environments and also concentrates on direct obstacle detection rather than navigation but also builds a global environment model by using the camera’s odometry. Schwarze et al. solely integrated acoustic feedback by placing virtual sound sources inside the derived global representation.

Lee and Medioni [26] also presented a vision-based and head-mounted system for indoor environments with integrated IMU sensory. They also use vibro-tactile feedback which is similar to our approach but with far fewer actuators. A relatively smaller number of vibration modules does not allow for rendering of fine direction differences. Lee and Medioni also use feature-based visual odometry as well as 2D mapping and path planning methods to generate the system’s feedback.

The system presented by Nguyen et al. [27] is also based on image sensors and a localization method like FAB-MAP [28] and is targeted for use in indoor environments. Nguyen’s work focuses on a SLAM system to provide maps and localization information but does not include feedback systems nor path planning component.

3 The System Concept

To provide maximal interoperability for research and development purposes we define the main building blocks and their relationships within our navigation-assistance system as shown in Fig. 1.

Fig. 1.
figure 1

Proposed system architecture based on individual and independent subsystems.

As is the case with any technical system, a suitable Hardware setup forms the foundation. In this case, suitable hardware entails a proper camera system, the hardware requirements for the chosen method of information display discussed earlier and a computing unit with sufficient processing power to run the VSLAM algorithm, perform camera depth calculations, and solve other highly computationally expensive tasks. Furthermore, adequate open interfaces are needed for the hardware integration itself.

The Image Capturing software module is essentially responsible for acquiring images from the camera system and passes the potentially processed data to the Visual-SLAM and Navigation modules in a format that is compatible with the rest of the system. Thus, the Data Acquisition module serves as an interface between these modules and should be responsible for the calculation of depth information, in case the SLAM system or camera is not capable to perform this on their on. This allows for flexibility in exchanging the camera sensors that are used in mono-, multi-view- or active-stereo systems while still being able to use the same software components.

Based on the acquired data, the selected Visual-SLAM framework provides the estimated pose as well as the generated map data primarily used by the Navigation module. The Navigation module is responsible for the interpretation of the obtained map, the computation of the path or motion plan based on the defined navigation target (Target Information) and for the consideration of current sensor data for several purposes including, e.g., collision avoidance, as discussed in [29]. Finally, the Navigation module’s planning results are passed to the Feedback element, which can process the data and convert the information to a format that can be sent to the Hardware element. To merge all software components together, we strongly recommend using an adequate modular Management Framework to encapsulate all elements.

We propose that the sensing camera system should be mounted on the user’s chest area (Fig. 2). In contrast to head-mounted approaches (cf. Sect. 2), this camera placement allows us to directly make use of the estimated ego-motion, provided by the SLAM system for navigation purposes, because the obtained pose points straight towards the major body direction and not to the user’s line of sight.

Fig. 2.
figure 2

Proposed camera mounting position at the user’s chest area.

Fig. 3.
figure 3

Proposed placement of the vibro-tactile belt.

As the system’s main assistance element, we suggest a vibro-tactile belt. This element should ideally be placed on the waist area as shown in Fig. 3. The multiple vibration modules with defined, equidistant inter-element distances render signals at different angles around the user’s body. As the relative position of both camera and belt can assumed to be fixed, we obtain a joint reference system for the generation of vibro-tactile feedback signals. This allows for direct generation of motion signals based on the current SLAM pose without the need to pay attention to other parts of the body or the user’s current line of sight.

In addition to the vibro-tactile interface, the presented concept framework can be extended to include extra feedback elements. Elements that would allow the system to cater to a visually-impaired target audience include auditory, haptic [30], or any other modality that can be interpreted by users with visual deficits.

When targeting visually-impaired users, another challenge besides appropriately providing feedback is the interpretation of information given as input by system users. In the framework of a navigation system, we primarily consider discrete Target Information as possible input data. As this information can easily be provided by human voice, we propose a speech recognition system as a solution to the input problem. Such systems are able to extract spoken information, like for requesting destinations and, combined with the already mentioned auditory interface, they can offer map-related target proposals and provide a dialog-based system interface, as proposed in [31].

4 The Final System Setup

For the implementation of the system conceptualized in the previous section, we rely on the open source meta operation system ROS [32]. ROS provides the required modular architecture and offers a huge community that provides many open source implementations of robotic and automotive applications and methods. ROS also allows the use of all relevant data types, messages to communicate between various packages, and common system drivers, which allows for seamless integration of any chosen hardware elements. For the image sensor we used common RGB-D camera models, such as the Kinect or the Asus Xtion Pro Live, mounted on the user’s chest or on a small cart for mapping purposes. Once the needed sensor characteristics have been obtained using established calibration methods like those provided by OpenCV [33], the rectified RGB image and the corresponding depth information can be passed to the VSLAM system using ROS’s messaging system.

The centerpiece of the software composition forms the VSLAM system which is responsible for providing the maps and localization information used from the navigation. For our project, we decided to use the Real-Time Appearance-Based Mapping (RTabMap) framework, developed by Labbé et al. [34]. Their appearance-based approach considers not just single image features but rather a combination of several feature vectors called visual words (VW). Managed within an incrementally built up vocabulary, these VWs can be used for image alignment, odometry derivation and for recognition of already-visited areas. Combined with a special memory management solution [35], Labbé and colleagues were able to achieve real-time performance for large scale maps with integrated online loop closure detection [36] based on RGB-D data. The use of this approach provides global map data in the form of a composite point cloud and a projected 2D occupancy grid [37]. It also grants access to the estimated 6DoF SLAM pose as it relates to the created map.

Fig. 4.
figure 4

Visualization of a current system state using RViz. Showing the estimated SLAM pose, derived global and local 2D maps and paths, rough footprint as well as the feedback and user visualization.

All of this information and data is used as input for the common ROS navigation stack [38]. A rough 2D footprint (see Fig. 4) is used to calculate the cost of every single element of the matrix-based occupancy grid. The resulting costmap is used as the configuration space for an A* algorithm originally presented in [39]. This global planner supplies our navigation system with the global path, starting at a current 2D pose estimated by the VSLAM and ending at a set target. The global path, combined with the current data from the vision sensor, is then used by the dynamic window approach introduced by Fox et al. [40] to compute local guidance paths. The resulting local plan provides local path information in the form of circular trajectories, as shown in Fig. 4. Such a path leads to the global path and is used to derive the feedback signal. We use the absolute angle difference between the current SLAM orientation and the latest available point of the computed local path. This navigation angle is directly passed to the vibro-tactile, waist-mounted feedback element. Its integration is further explained in Sect. 5. All of the aforementioned techniques related to navigation in ROS, are described in detail in [41].

A speech synthesis module based on the free software Festival [42] was also integrated into our system. This additional feedback source is primarily used to indicate reached targets but can be easily extended for use in passing all kinds of information related to the navigation task as well as the sensed environment.

To provide a sufficient representation of the system state, for research and development purposes, we decided to create a visualization based on ROS’s RViz. We display the derived global and local maps and paths, estimated poses and the feedback that is sent to the Feedback element. An example visualization of the displayed information is shown in Fig. 4.

All software components run on a Linux-based laptop computer, which can be carried inside a backpack.

For the current version of our project, we decided not to implement the complete user interface that was detailed in the system conceptualization section above. At present, we have provided functionalities to set goals within the recorded maps by a third party member. However, this way to set the navigation goals can not be used by the blind or visually impaired. The implementation of a suitable user interface will be within the scope of further work.

5 Information Display and Vibro-Tactile Patterns

To implement the vibro-tactile information display concept (Sect. 3), we used a prototype of a vibro-tactile belt developed and provided by the University of Osnabrück [14]. Figure 5 shows an overview of the belt system.

Fig. 5.
figure 5

Prototype of a vibro-tactile belt. (a) Belt with vibration modules, (b) external battery, (c) battery box with belt-controller, (d) mobile phone for interaction.

The belt can be worn on the waist and provides 32 vibration modules (Fig. 5a) spread equidistantly along its length, thereby providing an absolute angle resolution of \(\frac{360^{\circ }}{32} = 11,25^{\circ }\). A Python interface allowed seamless integration of the belt component into our ROS environment without taking a detour via a mobile phone (Fig. 5d). Note that for later user application packages, smartphones might replace the current computing solution to make the navigation solution more portable. According to the given hardware constraints of the prototype, the defined ROS-packages offer the possibility for simultaneous direct control of up to two vibro-tactile modules and send final commands to the hardware via the belt’s wireless Bluetooth interface placed inside the controller box (Fig. 5c), which also holds the battery (Fig. 5b). Due to the belt’s hardware design, we are not able to control the tactile intensity of the vibration modules in this version and are therefore restricted to a predefined intensity.

We defined several vibro-tactile patterns to provide a variety of different interpretable signals for the user, as described in Sect. 4. Another prime goal of our project is to determine which of the introduced signal patterns are most suitable for fulfilling a guidance task. A thorough evaluation of the signal patterns follows in Sect. 6. Figure 6 illustrates and defines the parameters of our guidance controller.

Fig. 6.
figure 6

Overview of the guidance controller reference parameters. \(V_{dir}\) denotes estimated SLAM pose, \(V_{vib}\) the direction for navigation signal, \(\varDelta \phi \) the absolute signal-angle sent to the belt, \(V_{Stroke}\) the direction for stroke signal, \(\alpha _{W}\) the tolerance window, n the specific vibration element.

The frontal vibration element (\(n_{v} = 0\)) is defined to be the element which points in the direction of the estimated SLAM pose \(V_{dir}\) (given by the transformation constraints explained in Sect. 3). The controller tries to minimize the difference angle (\(n_\phi \)) between \(V_{dir}\) and \(V_{vib}\). \(V_{vib}\) refers to the current local path information provided by the Navigation component described in Sect. 4. The final piece of information sent to the belt is \(\varDelta \phi \), which indicates the absolute position of element \(n_\phi \) and can be directly derived from vector \(V_{vib}\).

Based on these reference values we can define the following patterns:

  • Direct vibration. Single vibration of one element in one direction

  • Stroke vibration. Create a stroke signal in the direction of \(V_{Stroke}\), that sequentially triggers all elements from \(n_{v} = 0\) to \(n_\phi \) for a given span of time

To extend the range of possible feedback patterns that can be provided by the belt we implemented a tolerance window, defined here as a linearly increasing angular aperture \(\alpha _{W}\), which depends on the absolute two-dimensional Euclidean distance d between the current user position and the global navigation target. If the proposed \(V_{vib}\) lies within this window, a vibrational feedback signal will not be rendered to the user. This strategy was primarily introduced to avoid a sensory overload of the human tactile sensing system due to excessive vibration as discussed in [43]. We want to make sure \(\alpha _{W}\) will not get too generous (i.e. \(\alpha _{max}\, \hat{=}\) maximal angular aperture), and ensure that its max value will be cut off at a maximum, small enough to offer a proper final destination target. For this we defined a domain-based function A(d) given by Eq. 1 and illustrated by the normalized plot in Fig. 7. This function is used to compute the absolute angular aperture \(\alpha _{W}\) with respect to the static parameters \(d_{max}\), \(d_{min}\) and \(\alpha _{max}\).

$$\begin{aligned} A(d)= {\left\{ \begin{array}{ll} 0 &{} ,d < d_{min} \\ \frac{\alpha _{max}}{d_{max}}\,d &{} ,d_{min} \le d \le d_{max} \\ \alpha _{max} &{} ,d > d_{max} \end{array}\right. } \end{aligned}$$
(1)
Fig. 7.
figure 7

Plot of the domain-defined function A(d) formulated in Eq. 1.

The different combinations of all the aforementioned operation modes can be summarized by the following six feedback pattern variants (Table 1).

Table 1. Summary of the designed vibro-tactile feedback patterns.

The evaluation of the described patterns, based on a user test, is detailed in the following section.

6 Evaluation

For the evaluation of the navigation system and especially the different vibro-tactile stimulation patterns introduced in Sect. 5 we set up a small testing area. The area and the resultant map are shown in the form of a point cloud generated by the VSLAM system, as seen in Fig. 8, and a derived 2D costmap in Fig. 9, respectively.

Fig. 8.
figure 8

3D point cloud provided by the VSLAM system.

Fig. 9.
figure 9

Derived 2D navigation costmap of the test area.

The coordinates of both navigation goals, indicated in Fig. 9, have been preset and held constant across all trials to ensure the same target positions for all given navigation tasks. The predefined goals, together with the fixed environment and illumination, ensure equal conditions for all test runs. We tested each pattern in a random order with six participants (all with no actual visual impairments, but blindfolded) and set both navigation targets one after the other. We returned the participants back to a fixed starting position after each trial. We formulated the following three statements, which the participants had to subjectively rate on a given scale each time they reach both goals of a feedback pattern. The scale-based answers were designed to provide indications of possible weaknesses and limitations of each feedback pattern.

  • A: I have always been able to identify the displayed direction correctly. [Scale: 0–10]

    This statement is used to rate the overall identifiability of the signals.

  • B: I have always been able to transfer the given signals into my movements. [Scale: 0–10]

    This statement is used to rate how intuitively each signal can be transferred into movements.

  • C: The number of signals received have been too little/too much. [Scale: –3–+3]

    This statement evaluates the number of vibro-tactile feedback signals and allows for drawing conclusions as to a possible lack or overload of the discrete signals.

The user tests revealed significant differences between the individual feedback patterns. The graphs shown in Fig. 10 display the arithmetic mean values (\(\bar{x} \), blue), the medians (\(\widetilde{x}\), orange) and the occurred variances (\(\sigma ^2\), green) for the ratings of each feedback method, grouped by the individual types. While mean or median allows a direct judgment of a pattern, the variance can be considered as an indicator of the conformity or lack thereof in participant feedback.

Fig. 10.
figure 10

Summarized results of the user survey. Mean, median and between subject variances are rendered by \(\bar{x}\), \(\widetilde{x}\), \(\sigma ^2\), respectively. (Color figure online)

If we consider the identifiability and intuitiveness (highest values are best) of the patterns we find that the best ratings are those for modes one and six. Together with the best quantity ratings (zero is best) and the low variances in all evaluation criteria, we conclude that for a majority of users the single vibration mode is the best choice for fulfilling a given navigation task. All modes based on stroke signals show higher variances and lower ratings of the identifiability, when compared to the single signal feedback patterns. This leads us to presume that the stroke signals are harder to perceive and that the perception of strongly dynamic signals differs from person to person. The same judgments are revealed by considering the intuitiveness of the stroke patterns with the only difference being that the user evaluation we received had slightly lower variances.

All users noted that the integrated tolerance window and thus, the corresponding lack of vibro-tactile feedback, was interpreted as an indicator for a correct heading direction and provides better signal interpretation, as indicated by the ratings, especially in relation to the rated intuitiveness.

Our evaluation reveals that the developed prototype, i.e. the computer-vision based navigation planner combined with the vibro-tactile interface, is able to guide a person through a mapped environment, regardless of the ultimately chosen vibro-tactile pattern. The evaluation confirms the general functionality of the prototype and demonstrates that the computational methods we transferred from the robotics domain, can indeed be exploited for assistive human navigation. The user test also revealed advantages and disadvantages of the different feedback patterns and that simple, steady and easily perceivable patterns are preferred over rapidly changing or very dynamic signals, when designing an information display. These preferences are in line with the finding of Nagel et al. [13], where participants stated that two simultaneously vibrating motors are more distracting than helpful.

7 Conclusion and Future Work

We presented a system prototype which is able to guide blindfolded (potentially visually impaired) individuals through unknown areas based on a body-mounted vision-sensor combined with a vibro-tactile feedback system. The proposed system concept implements well-known algorithms and methods typically used for autonomous navigation of mobile robots and transfers the obtained information to the end user via vibro-tactile motion cues. We thereby demonstrated that such robotic concepts can also be used to guide human subjects. The modularity of the system we developed allows for an easy exchange of the hardware and software subsystem components and for a straightforward expansion of the general system by new modules, thus opening a window to future research.

One of our future goals is to provide novel path- and motion-planning solutions based on additional data acquired from the user. For example, we expect that more detailed knowledge derived from articulated body pose or specific dynamic states can be used to influence the planning results of the system and help to develop even more human-centered approaches. Our prototype will be extended with a proper user-interface. Namely, to provide a fully functional user interface, the system user should be provided the option and capability to set targets and control the system’s functionality themselves. These requirements demand the development of methods and algorithms for map-analysis, map-labeling and map-interpretation as well as the integration of additional hardware elements, if necessary, to allow for a natural user interface compatible with visually-impaired users. Additionally, a user friendly solution for the computation device (e.g. a smartphone app) could be developed to further the application potential for daily living.

Another advancement our system would include the use of 3D maps. An additional height dimension would allow for more reliable local obstacle avoidance when taking into account the user’s height relative to hanging or other roof-mounted objects.