Reusing Risk-Aware Stochastic Abstract Policies in Robotic Navigation Learning

  • Valdinei Freire da Silva
  • Marcelo Li Koga
  • Fábio Gagliardi Cozman
  • Anna Helena Reali Costa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8371)


In this paper we improve learning performance of a risk-aware robot facing navigation tasks by employing transfer learning; that is, we use information from a previously solved task to accelerate learning in a new task. To do so, we transfer risk-aware memoryless stochastic abstract policies into a new task. We show how to incorporate risk-awareness into robotic navigation tasks, in particular when tasks are modeled as stochastic shortest path problems. We then show how to use a modified policy iteration algorithm, called AbsProb-PI, to obtain risk-neutral and risk-prone memoryless stochastic abstract policies. Finally, we propose a method that combines abstract policies, and show how to use the combined policy in a new navigation task. Experiments validate our proposals and show that one can find effective abstract policies that can improve robot behavior in navigation problems.


Risk-Awareness Memoryless Stochastic Abstract Policies Transfer Learning 


  1. 1.
    Banerjee, B., Stone, P.: General game learning using knowledge transfer. In: Proc. of the Twentieth Int. Jt. Conf. on Artif. Intell., pp. 672–677. AAAI Press (2007)Google Scholar
  2. 2.
    Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. of Oper. Res. 16(3), 580–595 (1991)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Bianchi, R., Ribeiro, C., Costa, A.: Accelerating autonomous learning by using heuristic selection of actions. J. of Heuristics 14, 135–168 (2008)CrossRefGoogle Scholar
  4. 4.
    Delage, E., Mannor, S.: Percentile optimization for markov decision processes with parameter uncertainty. Oper. Res. 58(1), 203–213 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Fernández, F., García, J., Veloso, M.: Probabilistic Policy Reuse for inter-task transfer learning. Robotics and Auton. Syst. 58(7), 866–871 (2010)CrossRefGoogle Scholar
  6. 6.
    Howard, R.A., Matheson, J.E.: Risk-sensitive markov decision processes. Management Science 18(7), 356–369 (1972)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Koga, M.L., Silva, V.F., Costa, A.H.R.: Speeding-up reinforcement learning tasks through abstraction and transfer learning. In: Proc. of the Twelfth Int. Jt. Conf. on Auton. Agents and Multiagent Syst. (AAMAS 2013), pp. 119–126 (2013)Google Scholar
  8. 8.
    Konidaris, G., Scheidwasser, I., Barto, A.: Transfer in reinforcement learning via shared features. J. of Mach. Learn. Res. 13, 1333–1371 (2012)zbMATHMathSciNetGoogle Scholar
  9. 9.
    Li, L., Walsh, T.J., Littman, M.L.: Towards a Unified Theory of State Abstraction for MDPs. In: Proc. of the Ninth Int. Sympos. on Artif. Intell. and Math., pp. 531–539. ISAIM (2006)Google Scholar
  10. 10.
    Littman, M.L.: Memoryless policies: theoretical limitations and practical results. In: Proc. of the Third Int. Conf. on Simul. of Adapt. Behav.: from Animals to Animats 3, pp. 238–245. MIT Press, Brighton (1994)Google Scholar
  11. 11.
    Liu, Y., Koenig, S.: Probabilistic planning with nonlinear utility functions. In: ICAPS, pp. 410–413 (2006)Google Scholar
  12. 12.
    Liu, Y., Stone, P.: Value-function-based transfer for reinforcement learning using structure mapping. In: Proc. of the Twenty-First Natl. Conf. on Artif. Intell., pp. 415–420. AAAI Press (2006)Google Scholar
  13. 13.
    Mannor, S., Tsitsiklis, J.: Mean-variance optimization in markov decision processes. In: Proc. of the Twenty-Eighth Intl. Conf. on Mach. Learn. (ICML 2011), pp. 177–184. ACM (2011)Google Scholar
  14. 14.
    Matos, T., Bergamo, Y.P., Silva, V.F., Cozman, F.G., Costa, A.H.R.: Simultaneous Abstract and Concrete Reinforcement Learning. In: Proc. of the Ninth Symp. of Abstr., Reformul., and Approx (SARA 2011), pp. 82–89. AAAI Press (2011)Google Scholar
  15. 15.
    Minami, R., da Silva, V.F.: Shortest stochastic path with risk sensitive evaluation. In: Batyrshin, I., González Mendoza, M. (eds.) MICAI 2012, Part I. LNCS (LNAI), vol. 7629, pp. 371–382. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  16. 16.
    Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc. (1994)Google Scholar
  17. 17.
    da Silva, V.F., Pereira, F.A., Costa, A.H.R.: Finding memoryless probabilistic relational policies for inter-task reuse. In: Greco, S., Bouchon-Meunier, B., Coletti, G., Fedrizzi, M., Matarazzo, B., Yager, R.R. (eds.) IPMU 2012, Part II. CCIS, vol. 298, pp. 107–116. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning without state-estimation in partially observable markovian decision processes. In: Proc. of the Eleventh Int. Conf. on Mach. Learn. (ICML 1994), vol. 31, p. 37. Morgan Kaufmann (1994)Google Scholar
  19. 19.
    Taylor, M.E., Stone, P., Liu, Y.: Transfer learning via inter-task mappings for temporal difference learning. J. of Mach. Learn. Res. 8(1), 2125–2167 (2007)zbMATHMathSciNetGoogle Scholar
  20. 20.
    Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer (2003)Google Scholar
  21. 21.
    Whittle, P.: Why discount? the rationale of discounting in optimisation problems. In: Heyde, C., Prohorov, Y., Pyke, R., Rachev, S. (eds.) Athens Conference on Applied Probability and Time Series Analysis. Lecture Notes in Statistics, vol. 114, pp. 354–360. Springer, New York (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Valdinei Freire da Silva
    • 1
  • Marcelo Li Koga
    • 2
  • Fábio Gagliardi Cozman
    • 2
  • Anna Helena Reali Costa
    • 2
  1. 1.Escola de Artes, Ciências e HumanidadesUniversidade de São PauloSão PauloBrazil
  2. 2.Escola PolitécnicaUniversidade de São PauloSão PauloBrazil

Personalised recommendations