Learning to Transform Service Instructions into Actions with Reinforcement Learning and Knowledge Base

  • Meng-Yang Zhang
  • Guo-Hui TianEmail author
  • Ci-Ci Li
  • Jing Gong
Research Article


In order to improve the learning ability of robots, we present a reinforcement learning approach with a knowledge base for mapping natural language instructions to executable action sequences. A simulated platform with physical engine is built as interactive environment. Based on the knowledge base, a reward function with immediate rewards and delayed rewards is designed to handle sparse reward problems. Also, a list of object states is produced by retrieving the knowledge base, as a standard to define the quality of action sequences. Experimental results demonstrate that our approach yields good performance on accuracy of action sequences production.


Natural language robot knowledge base reinforcement learning object state 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



This work was supported by National Natural Science Foundation of China (No. 61773239) and Shenzhen Future Industry Special Fund (No. JCYJ20160331174814755).


  1. [1]
    W. Wang, Q. F. Zhao, T. H. Zhu. Research of natural language understanding in human-service robot interaction. Microcomputer Applications, vol. 3, no. 1, pp. 45–49, 2015.Google Scholar
  2. [2]
    L. F. Shang, Z. D. Lu, H. Li. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, IEEE, Beijing, China, pp. 1577–1586, 2015. Doi: 10.3115/v1/P15-1152.Google Scholar
  3. [3]
    J. M. Ji, X. P. Chen. A weighted causal theory for acquiring and utilizing open knowledge. International Journal of Approximate Reasoning, vol. 55, no. 9, pp. 2071–2082, 2014. Doi: 10.1016/j.ijar.2014.03.002.Google Scholar
  4. [4]
    M. Tenorth, M. Beetz. Know rob-knowledge processing for autonomous personal robots. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, St. Louis, USA, pp. 4261266, 2009. Doi: 10.1109/IRGS.2009.5354602.Google Scholar
  5. [5]
    M. Waibel, M. Beetz, J. Civera, R. D’Andrea, J. Elfring, D. Galvez-Lopez, K. Haussermann, R. Janssen, J. M. M. Montiel, A. Perzylo, B. Schiessle, M. Tenorth, O. Zweigle, R. van de Molengraft. Roboearth. IEEE Robotics and Automation Magazine, vol. 18, no. 2, pp. 69–82, 2011. DOI: 10.1109/MRA.2011.941632.CrossRefGoogle Scholar
  6. [6]
    R. Reiter. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems, Cambridge, USA: MIT Press, 2001.zbMATHGoogle Scholar
  7. [7]
    D. McDermott. The formal semantics of processes in PDDL. In Proceedings of the 23th International Conference on Automated Planning Scheduling, Rome, Italy, 2003.Google Scholar
  8. [8]
    M. Fox, D. Long. PDDL2.1: An extension to PDDL for expressing temporal planning domains. Journal of Artificial Intelligence Research, vol. 20, pp. 61–124, 2003. DOI: 10. 1613/jair.1129.CrossRefzbMATHGoogle Scholar
  9. [9]
    L. P. Kaelbling, M. L. Littman, A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, vol. 101, no. 1–2, pp. 99–134, 1998. DOI: 10.1016/S0004-3702(98)00023-X.Google Scholar
  10. [10]
    I. A. Hameed. Using natural language processing (NLP) for designing socially intelligent robots. In Proceedings of Joint IEEE International Conference on Development and Learning and Epigenetic Robotics, IEEE, Cergy-Pontoises, France, pp. 268–269, 2016. DOI: 10.1109/DEVLRN. 2016.7846830.Google Scholar
  11. [11]
    M. Tenorth, D. Nyga, M. Beetz. Understanding and executing instructions for everyday manipulation tasks from the World Wide Web. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Anchorage, USA, pp. 1486–1491, 2010. DOI: 10.1109/ROBOT.2010.5509955.Google Scholar
  12. [12]
    M. Tenorth, U. Klank, D. Pangercic, M. Beetz. Web-enabled robots. IEEE Robotics & Automation Magazine, vol. 18, no. 2, pp. 58–68, 2011. DOI: 10.1109/MRA.2011. 940993.CrossRefGoogle Scholar
  13. [13]
    Y. LeCun, Y. G. Bengio, G. Hinton. Deep learning. Nature, vol. 521, no. 7553, pp. 436–444, 2015. DOI: 10.1038/nature14539.CrossRefGoogle Scholar
  14. [14]
    L. Deng, D. Yu. Deep learning: Methods and applications. Foundations and Trends in Signal Processing, vol. 7, no. 3–4, pp. 197–387, 2014. DOI: 10.1561/2000000039.Google Scholar
  15. [15]
    G. Hinton, L. Deng, D. Yu, G. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012. DOI: 10.1109/MSP.2012.2205597.CrossRefGoogle Scholar
  16. [16]
    A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems, Lake Tahoe, USA, pp. 1097–1105, 2012.Google Scholar
  17. [17]
    V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis. Human-level control through deep reinforcement learning. Nature, vol. 518, no. 7540, pp. 529–533, 2015. DOI: 10.1038/nature14236.CrossRefGoogle Scholar
  18. [18]
    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, vol. 529, no. 7587, pp. 484–489, 2016. DOI: 10.1038/nature16961.CrossRefGoogle Scholar
  19. [19]
    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra. Continuous control with deep reinforcement learning. Computer Science, vol. 529, no. 7587, pp. 484–489, 2015.Google Scholar
  20. [20]
    Y. Duan, X. Chen, R. Houthooft, J. Schulman, P. Abbeel. Benchmarking deep reinforcement learning for continuous control. In Proceedings of the 33rd International Conference on Machine Learning, ACM, New York, USA, pp. 1329–1338, 2016.Google Scholar
  21. [21]
    R. S. Sutton, A. G. Barto. Reinforcement Learning: An Introduction, Cambridge, UK: MIT Press, 1998.Google Scholar
  22. [22]
    J. He, M. Ostendorf, X. D. He, J. S. Chen, J. F. Gao, L. H. Li, L. Deng. Deep reinforcement learning with a combinatorial action space for predicting popular Reddit threads.
  23. [23]
    D. Dowty. Compositionality as an empirical problem. Direct Compositionality, C. Barker, P. I. Jacobson, Eds., Oxford, UK: Oxford University Press, pp. 23–101, 2007.Google Scholar
  24. [24]
    K. S. Tai, R. Socher, C. D. Manning. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, pp. 1556–1566, 2015.Google Scholar
  25. [25]
    S. R. Bowman, J. Gauthier, A. Rastogi, R. Gupta, C. D. Manning, C. Potts. A fast unified model for parsing and sentence understanding. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 1466–1477, 2016.Google Scholar
  26. [26]
    R. Kaplan, C. Sauer, A. Sosa. Beating Atari with natural language guided reinforcement learning. Computer Science.
  27. [27]
    F. Wu, Z. W. Xu, Y. Yang. An end-to-end approach to natural language object retrieval via context-aware deep reinforcement learning.
  28. [28]
    S. R. K. Branavan, H. Chen, L. S. Zettlemoyer, R. Barzilay. Reinforcement learning for mapping instructions to actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, pp. 82–90, 2009. DOI: 10.3115/1687878.1687892.Google Scholar
  29. [29]
    A. Pritzel, B. Uria, S. Srinivasan, A. Puigdomenech, O. Vinyals, D. Hassabis, D. Wierstra, C. Blundell. Neural episodic control. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp. 963–975, 2017.Google Scholar
  30. [30]
    A. S. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg, D. Silver, K. Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 2017.Google Scholar
  31. [31]
    M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, K. Kavukcuoglu. Reinforcement learning with unsupervised auxiliary tasks. Computer Science.
  32. [32]
    G. Lample, D. S. Chaplot. Playing FPS games with deep reinforcement learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, USA, pp. 2140–2146, 2017.Google Scholar
  33. [33]
    Q. Y. Gu, I. Ishii. Review of some advances and applications in real-time high-speed vision: our views and experiences. International Journal of Automation and Computing, vol. 13, no. 4, pp. 305–318, 2016. DOI: 10.1007/s11633-016-1024-0.CrossRefGoogle Scholar
  34. [34]
    S. Miyashita, X. Y. Lian, X. Zeng, T. Matsubara, K. Uehara. Developing game AI agent behaving like human by mixing reinforcement learning and supervised learning. In Proceedings of the 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, IEEE, Kanazawa, Japan, pp. 489–494, 2017. Doi: 10.1109/SNPD. 2017.8022767.Google Scholar
  35. [35]
    Y. K. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, F. F. Li, A. Farhadi. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Singapore, pp. 3357–3364, 2017. Doi: 10.1109/ICRA.2017.7989381.Google Scholar
  36. [36]
    Q. V. Le. Building high-level features using large scale unsupervised learning. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Vancouver, Canada, pp. 8595–8598, 2013. Doi: 10.1109/ICASSP.2013.6639343.Google Scholar
  37. [37]
    R. S. Sutton, D. McAllester, S. Singh, Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of Advances in Neural Information Processing Systems, Denver, USA, pp. 1057–1063, 2000.Google Scholar
  38. [38]
    D. R. Liu, H. L. Li, D. Wang. Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey. International Journal of Automation and Computing, vol. 12, no. 3, pp. 229–242, 2015. Doi: 10.1007/s11633-015-0893-y.CrossRefGoogle Scholar

Copyright information

© Institute of Automation, Chinese Academy of Sciences and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Control Science and EngineeringShandong UniversityJinanChina
  2. 2.Shenzhen Research InstituteShandong UniversityShenzhenChina

Personalised recommendations