In this article, we present a novel approach to learning efficient navigation policies for mobile robots that use visual features for localization. As fast movements of a mobile robot typically introduce inherent motion blur in the acquired images, the uncertainty of the robot about its pose increases in such situations. As a result, it cannot be ensured anymore that a navigation task can be executed efficiently since the robot’s pose estimate might not correspond to its true location. We present a reinforcement learning approach to determine a navigation policy to reach the destination reliably and, at the same time, as fast as possible. Using our technique, the robot learns to trade off velocity against localization accuracy and implicitly takes the impact of motion blur on observations into account. We furthermore developed a method to compress the learned policy via a clustering approach. In this way, the size of the policy representation is significantly reduced, which is especially desirable in the context of memory-constrained systems. Extensive simulated and real-world experiments carried out with two different robots demonstrate that our learned policy significantly outperforms policies using a constant velocity and more advanced heuristics. We furthermore show that the policy is generally applicable to different indoor and outdoor scenarios with varying landmark densities as well as to navigation tasks of different complexity.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Bay, H., Tuytelaars, T., & Van Gool, L. (2006). SURF: speeded-up robust features. Proc. of the European Conf. on Computer Vision, 110(3), 346–359.
Bennewitz, M., Stachniss, C., Burgard, W., & Behnke, S. (2006). Metric localization with scale-invariant visual features using a single perspective camera. In H. I. Christiensen (Ed.), Springer tracts in advanced robotics : Vol. 22, European robotics symposium 2006. Berlin: Springer.
Brock, O., & Khatib, O. (1999). High-speed navigation using the global dynamic window approach. In Proc. of the IEEE int. conf. on robotics & automation—ICRA.
Bryson, M., & Sukkarieh, S. (2006). Active airborne localisation and exploration in unknown environments using inertial SLAM. In IEEE Aerospace Conference.
Cassandra, A. R., Kaelbling, L. P., & Kurien, J. A. (1996). Acting under uncertainty: discrete Bayesian models for mobile-robot navigation. In Proc. of the IEEE/RSJ int. conf. on intelligent robots and systems—IROS (pp. 963–972).
Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1), 219–245.
Fox, D., Burgard, W., & Thrun, S. (1997). The dynamic window approach to collision avoidance. IEEE Robotics & Automation Magazine, 4, 23–33.
He, R., Prentice, S., & Roy, N. (2008). Planning in information space for a quadrotor helicopter in a GPS-denied environments. In Proc. of the IEEE int. conf. on robotics & automation—ICRA (pp. 1814–1820).
Hornung, A., Strasdat, H., Bennewitz, M., & Burgard, W. (2009). Learning efficient policies for vision-based navigation. In Proc. of the IEEE/RSJ int. conf. on intelligent robots and systems—IROS.
Ido, J., Shimizu, Y., Matsumoto, Y., & Ogasawara, T. (2009). Indoor navigation for a humanoid robot using a view sequence. Int. Journal of Robotics Research, 28(2), 315–325.
Julier, S. J., & Uhlmann, J. K. (1997). A new extension of the Kalman filter to nonlinear systems. In Int. symposium on aerospace/defense sensing, simulation and controls, pp. 182–193.
Kass, R. E., & Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association, 90(431), 928–934.
Kollar, T., & Roy, N. (2006). Using reinforcement learning to improve exploration trajectories for error minimization. In Proc. of the IEEE int. conf. on robotics & automation—ICRA (pp. 3338–3343).
Kwok, C., & Fox, D. (2004). Reinforcement learning for sensing strategies. In Proc. of the IEEE/RSJ int. conf. on intelligent robots and systems—IROS (vol. 4, pp. 3158–3163), 28 Sept.–2 Oct.
LaValle, S. M., & Kuffner, J. J. (1999). Randomized kinodynamic planning. In Proc. of the IEEE int. conf. on robotics & automation—ICRA (pp. 473–479).
Lovejoy, W. S. (1991). Computationally feasible bounds for partially observed Markov decision processes. Operations Research, 39(1), 162–175.
Martinez-Cantin, R., de Freitas, N., Brochu, E., Castellanos, J., & Doucet, A. (2009). A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Journal of Autonomous Robots, 27(2), 93–103.
Menache, I., Mannor, S., & Shimkin, N. (2005). Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134(1), 215–238.
Michels, J., Saxena, A., & Ng, A. Y. (2005). High speed obstacle avoidance using monocular vision and reinforcement learning. In Proc. of the int. conf. on machine learning—ICML (pp. 593–600). New York: ACM.
Miura, J., Negishi, Y., & Shirai, Y. (2006). Adaptive robot speed control by considering map and motion uncertainty. Journal of Robotics & Autonomous Systems, 54(2), 110–117.
Neumann, G. (2005). The reinforcement learning toolbox, reinforcement learning for optimal control tasks. Diplomarbeit, Technischen Universität (University of Technology) Graz, May 2005.
Pelleg, D., & Moore, A. (2000). X-means: extending K-means with efficient estimation of the number of clusters. In Proc. of the int. conf. on machine learning—ICML (pp. 727–734). San Mateo: Morgan Kaufmann.
Pretto, A., Menegatti, E., Bennewitz, M., Burgard, W., & Pagello, E. (2009). A visual odometry framework robust to motion blur. In Proc. of the IEEE int. conf. on robotics & automation (ICRA).
Roy, N., & Gordon, G. (2002). Exponential family PCA for belief compression in POMDPs. In S. Becker, S. Thrun, K. Obermayer (Eds.), Proc. of the conf. on neural information processing systems—NIPS (pp. 1043–1049), Vancouver, Canada, December 2002.
Roy, N., & Thrun, S. (1999). Coastal navigation with mobile robots. In Proc. of the conf. on neural information processing systems—NIPS (vol. 12, pp. 1043–1049).
Roy, N., Burgard, W., Fox, D., & Thrun, S. (1999). Coastal navigation–mobile robot navigation with uncertainty in dynamic environments. In Proc. of the IEEE int. conf. on robotics & automation—ICRA (vol. 1, pp. 35–40).
Rubinstein, R. Y., & Kroese, D. P. (2004). The cross-entropy method: a unified approach to combinatorial optimization, monte-carlo simulation and neural computation. Berlin: Springer.
Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems (Technical report CUED/F-INFENG/TR 166). Cambridge University, Cambridge, UK, September 1994.
Satoh, H. (2006). A state space compression method based on multivariate analysis for reinforcement learning in high-dimensional continuous state spaces. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E89-A(8), 2181–2191.
Schlegel, C. (1998). Fast local obstacle avoidance under kinematic and dynamic constraints for a mobile robot. In: Proc. of the IEEE/RSJ int. conf. on intelligent robots and systems—IROS.
Simmons, R. (1996). The curvature-velocity method for local obstacle avoidance. In Proc. of the IEEE int. conf. on robotics & automation—ICRA.
Sondik, E. J. (1971). The optimal control of partially observable Markov decision processes. Ph.D. thesis, Stanford University, Stanford, USA.
Stachniss, C., & Burgard, W. (2002). An integrated approach to goal-directed obstacle avoidance under dynamic constraints for dynamic environments. In Proc. of the IEEE/RSJ int. conf. on intelligent robots and systems—IROS (pp. 508–513), Lausanne, Switzerland.
Strasdat, H., Stachniss, C., & Burgard, W. (2009). Which landmark is useful? Learning selection policies for navigation in unknown environments. In Proc. of the IEEE int. conf. on robotics & automation—ICRA.
Sutton, R. S. (1996). Generalization in reinforcement learning: successful examples using sparse coarse coding. In Proc. of the conf. on neural information processing systems—NIPS (pp. 1038–1044). Cambridge: MIT Press.
Sutton, R. S., & Barto, A. G. (1998). Adaptive computation and machine learning reinforcement learning: an introduction. Cambridge: MIT Press.
Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic Robotics. Cambridge: MIT Press.
Uther, W. T. B., & Veloso, M. M. (1998). Tree based discretization for continuous state space reinforcement learning. In Proc. of the national conference on artificial intelligence—AAAI (pp. 769–774).
Van Huynh, A., & Roy, N. (2009). icLQG: combining local and global optimization for control in information space. In Proc. of the IEEE international conference on robotics and automation—ICRA.
Weiss, C., Fröhlich, H., & Zell, A. (2006). Vibration-based terrain classification using support vector machines. In Proc. of the IEEE/RSJ int. conf. on intelligent robots and systems—IROS.
Wiering, M., & Schmidhuber, J. (1998). Fast online Q(λ). Machine Learning, 33(1), 105–115.
Wurm, K. M., Kuemmerle, R., Stachniss, C., & Burgard, W. (2009). Improving robot navigation in structured outdoor environments. In Proc. of the IEEE/RSJ int. conf. on intelligent robots and systems—IROS.
This work has been supported by the German Research Foundation (DFG) under contract number SFB/TR-8.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Hornung, A., Bennewitz, M. & Strasdat, H. Efficient vision-based navigation. Auton Robot 29, 137–149 (2010). https://doi.org/10.1007/s10514-010-9190-3
- Reinforcement learning
- Motion blur