HI-VAL: Iterative Learning of Hierarchical Value Functions for Policy Generation

Capobianco, Roberto; Riccio, Francesco; Nardi, Daniele

doi:10.1007/978-3-030-01370-7_33

Roberto Capobianco¹⁸,
Francesco Riccio¹⁸ &
Daniele Nardi¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 867))

Included in the following conference series:

International Conference on Intelligent Autonomous Systems

1344 Accesses

Abstract

Task decomposition is effective in various applications where the global complexity of a problem makes planning and decision-making too demanding. This is true, for example, in high-dimensional robotics domains, where (1) unpredictabilities and modeling limitations typically prevent the manual specification of robust behaviors, and (2) learning an action policy is challenging due to the curse of dimensionality. In this work, we borrow the concept of Hierarchical Task Networks (HTNs) to decompose the learning procedure, and we exploit Upper Confidence Tree (UCT) search to introduce Hi-Val, a novel iterative algorithm for hierarchical optimistic planning with learned value functions. To obtain better generalization and generate policies, Hi-Val simultaneously learns and uses action values. These are used to formalize constraints within the search space and to reduce the dimensionality of the problem. We evaluate our algorithm both on a fetching task using a simulated 7-DOF KUKA light weight arm and, on a pick and delivery task with a Pioneer robot.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agostini, A., Celaya, E.: Reinforcement learning with a Gaussian mixture model. In: The 2010 International Joint Conference on Neural Networks, pp. 1–8, July 2010
Google Scholar
Anand, A., Grover, A., Mausam, M., Singla, P.: ASAP-UCT: abstraction of state-action pairs in UCT. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI 2015, pp. 1509–1515. AAAI Press (2015). http://dl.acm.org/citation.cfm?id=2832415.2832459
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Article Google Scholar
Bagnell, J.A., Schneider, J.G.: Autonomous helicopter control using reinforcement learning policy search methods. In: 2001 IEEE International Conference on Robotics and Automation, vol. 2, pp. 1615–1620 (2001)
Google Scholar
Chowdhary, G., Liu, M., Grande, R., Walsh, T., How, J., Carin, L.: Off-policy reinforcement learning with Gaussian processes. IEEE/CAA J. Autom. Sinica 1(3), 227–238 (2014)
Article Google Scholar
Clair, A.S., Saldanha, C., Boteanu, A., Chernova, S.: Interactive hierarchical task learning via crowdsourcing for robot adaptability. In: Refereed Workshop Planning for Human-Robot Interaction: Shared Autonomy and Collaborative Robotics at Robotics: Science and Systems, Ann Arbor, Michigan. RSS (2016)
Google Scholar
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. (JAIR) 13, 227–303 (2000)
Article MathSciNet Google Scholar
Erol, K., Hendler, J., Nau, D.S.: HTN planning: complexity and expressivity. In: the Twelfth National Conference on Artificial Intelligence, vol. 2, AAAI 1994, pp. 1123–1128. American Association for Artificial Intelligence, Menlo Park (1994). http://dl.acm.org/citation.cfm?id=199480.199459
Hostetler, J., Fern, A., Dietterich, T.G.: Sample-based tree search with fixed and adaptive state abstractions. J. Artif. Intell. Res. 60, 717–777 (2017). https://doi.org/10.1613/jair.5483
Article MathSciNet MATH Google Scholar
Jun, M., Kenji, D.: Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Rob. Autonom. Syst. 36(1), 37–51 (2001)
Article Google Scholar
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Rob. Res. (2013)
Google Scholar
Kober, J., Peters, J.R.: Policy search for motor primitives in robotics. In: Advances in Neural Information Processing Systems, pp. 849–856 (2009)
Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Google Scholar
Kohl, N., Stone, P.: Policy gradient reinforcement learning for fast quadrupedal locomotion. In: 2004 IEEE International Conference on Robotics and Automation, vol. 3, pp. 2619–2624, April 2004
Google Scholar
Konidaris, G., Kuindersma, S., Grupen, R., Barto, A.: Robot learning from demonstration by constructing skill trees. Int. J. Rob. Res. 31(3), 360–375 (2012)
Article Google Scholar
Riccio, F., Capobianco, R., Nardi, D.: DOP: deep optimistic planning with approximate value function evaluation. In: Proceedings of the 2018 International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (2018)
Google Scholar
Riccio, F., Capobianco, R., Nardi, D.: Q-CP: learning action values for cooperative planning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA) (2018)
Google Scholar
Ross, S., Gordon, G.J., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: International Conference on Artificial Intelligence and Statistics, pp. 627–635 (2011)
Google Scholar
Schaul, T., Ring, M.: Better generalization with forecasts. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI 2013, pp. 1656–1662. AAAI Press (2013)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–503 (2016)
Article Google Scholar
Silver, D., Sutton, R.S., Müller, M.: Temporal-difference search in computer Go. Mach. Learn. 87(2), 183–219 (2012)
Article MathSciNet Google Scholar
Stulp, F., Schaal, S.: Hierarchical reinforcement learning with movement primitives. In: 2011 IEEE-RAS International Conference on Humanoid Robots, pp. 231–238, October 2011
Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer, Control, and Management Engineering, Sapienza University of Rome, via Ariosto 25, 00185, Rome, Italy
Roberto Capobianco, Francesco Riccio & Daniele Nardi

Authors

Roberto Capobianco
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Riccio
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Nardi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roberto Capobianco .

Editor information

Editors and Affiliations

Baden-Wuerttemberg Cooperative State University, Karlsruhe, Germany
Marcus Strand
Humanoids and Intelligence Systems Lab, KIT - Karlsruher Institut für Technologie, Karlsruhe, Germany
Rüdiger Dillmann
University of Padua , Padua, Italy
Emanuele Menegatti
University of Padua, Padua, Italy
Stefano Ghidoni

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Capobianco, R., Riccio, F., Nardi, D. (2019). HI-VAL: Iterative Learning of Hierarchical Value Functions for Policy Generation. In: Strand, M., Dillmann, R., Menegatti, E., Ghidoni, S. (eds) Intelligent Autonomous Systems 15. IAS 2018. Advances in Intelligent Systems and Computing, vol 867. Springer, Cham. https://doi.org/10.1007/978-3-030-01370-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-01370-7_33
Published: 31 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01369-1
Online ISBN: 978-3-030-01370-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics