Feature Learning and Transfer Performance Prediction for Video Reinforcement Learning Tasks via a Siamese Convolutional Neural Network

  • Jinhua Song
  • Yang GaoEmail author
  • Hao Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11301)


In this paper, we handle the negative transfer problem by a deep learning method to predict the transfer performance (positive/negative transfer) between two reinforcement learning tasks. We consider same domain transfer for video reinforcement learning tasks such as video games which can be described as images and perceived by an agent with visual ability. Our method directly trains a neural network from raw task descriptions without other prior knowledge such as models of tasks, target task samples and human experience. The architecture of our neural network consists of two parts: a siamese convolutional neural network to learn the features of each pair of tasks and a softmax layer to predict the binary transfer performance. We conduct extensive experiments in the maze domain and the Ms. PacMan domain to evaluate the performance of our method. The results show the effectiveness and superiority of our method compared with the baseline methods.


Transfer learning Deep neural network Reinforcement learning task Transfer performance 



This work was supported by the National Key R&D Program of China [2017YFB0702600, 2017YFB0702601] and the National Natural Science Foundation of China [grant numbers 61432008, U1435214, 61503178].


  1. 1.
    Ammar, H.B., et al.: An automated measure of MDP similarity for transfer in reinforcement learning. In: Workshop at the 28th AAAI Conference on Artificial Intelligence (2014)Google Scholar
  2. 2.
    Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 437–478. Springer, Heidelberg (2012). Scholar
  3. 3.
    Fitzgerald, T., Goel, A., Thomaz, A.: Human-robot co-creativity: task transfer on a spectrum of similarity. In: Proceedings of 8th International Conference on Computational Creativity (2017)Google Scholar
  4. 4.
    Frank, E., Hall, M.A., Witten, I.H.: The WEKA workbench. In: Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th edn. Morgan Kaufmann, Los Altos (2016)Google Scholar
  5. 5.
    Hanna, J.P., Stone, P.: Grounded action transformation for robot learning in simulation. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 3834–3840. AAAI Press (2017)Google Scholar
  6. 6.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014)Google Scholar
  7. 7.
    Konidaris, G., Scheidwasser, I., Barto, A.G.: Transfer in reinforcement learning via shared features. J. Mach. Learn. Res. 13(1), 1333–1371 (2012)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1106–1114 (2012)Google Scholar
  9. 9.
    Lazaric, A.: Transfer in reinforcement learning: a framework and a survey. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning: State-of-the-Art. Adaptation, Learning, and Optimization, vol. 12, pp. 143–173. Springer, Heidelberg (2012). Scholar
  10. 10.
    Lazaric, A., Restelli, M., Bonarini, A.: Transfer of samples in batch reinforcement learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 544–551. ACM (2008)Google Scholar
  11. 11.
    Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  12. 12.
    Mehta, N., Natarajan, S., Tadepalli, P., Fern, A.: Transfer in variable-reward hierarchical reinforcement learning. Mach. Learn. 73(3), 289–312 (2008)CrossRefGoogle Scholar
  13. 13.
    Mousavi, A., Araabi, B.N., Ahmadabadi, M.N.: Context transfer in reinforcement learning using action-value functions. Comput. Intell. Neurosci. 2014, 428567 (2014)CrossRefGoogle Scholar
  14. 14.
    Pan, J., Wang, X., Cheng, Y., Cao, G.: Multi-source transfer ELM-based Q learning. Neurocomputing 137, 57–64 (2014)CrossRefGoogle Scholar
  15. 15.
    Rosenstein, M.T., Marx, Z., Kaelbling, L.P., Dietterich, T.G.: To transfer or not to transfer. In: NIPS Workshop on Inductive Transfer: 10 Years Later (2005)Google Scholar
  16. 16.
    Rummery, G.A., Niranjan, M.: On-line q-learning using connectionist systems. Technical report, University of Cambridge (1994)Google Scholar
  17. 17.
    Sinapov, J., Narvekar, S., Leonetti, M., Stone, P.: Learning inter-task transferability in the absence of target task samples. In: Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems, pp. 725–733. ACM (2015)Google Scholar
  18. 18.
    Song, J., Gao, Y., Wang, H., An, B.: Measuring the distance between finite Markov decision processes. In: Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems, pp. 468–476. ACM (2016)Google Scholar
  19. 19.
    Taylor, M.E., Carboni, N., Fachantidis, A., Vlahavas, I.P., Torrey, L.: Reinforcement learning agents providing advice in complex video games. Connect. Sci. 26(1), 45–63 (2014)CrossRefGoogle Scholar
  20. 20.
    Taylor, M.E., Stone, P.: Behavior transfer for value-function-based reinforcement learning. In: Proceedings of the 4th International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 53–59. ACM (2005)Google Scholar
  21. 21.
    Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Taylor, M.E., Stone, P.: An introduction to intertask transfer for reinforcement learning. AI Mag. 32(1), 15 (2011)CrossRefGoogle Scholar
  23. 23.
    Watkins, C., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)zbMATHGoogle Scholar
  24. 24.
    Wikipedia: Ms. PacMan (2018).

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.State Key Laboratory for Novel Software Technology, Collaborative Innovation Center of Novel Software Technology and IndustrializationNanjing UniversityNanjingChina

Personalised recommendations