Robot Intelligent Trajectory Planning Based on PCM Guided Reinforcement Learning

  • Xiang Teng
  • Jian FuEmail author
  • Cong Li
  • ZhaoJie Ju
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11745)


Reinforcement Learning (RL) was successfully applied in multi-degree-of-freedoms robot to acquire motor skills, however, it hardly ever consider each joints’ relationship, or just think about the linear relationship between them. In order to find the nonlinear relationship between each degrees of freedom (DOFs), we propose a Pseudo Covariance Matrix (PCM) to guide reinforcement learning for motor skill acquisition. Specifically it combined Path Integral Policy Improvement (\(\mathrm{PI}^2\)) with Kernel Canonical Correlation Analysis (KCCA), where KCCA is used to obtain the PCM in high dimensional space and record it as the heuristic information to search an optimal/sub-optimal strategy. The experiments based on robots (SCARA and UR5) demonstrate the new method is feasible and effective.


Trajectory planning Learning from demonstration Kernel Canonical Correlation Analysis Path Integral Policy Improvement Pseudo Covariance Matrix 


  1. Theodorou, E., Buchli, J., Schaal, S.: A generalized path integral control approach to reinforcement learning. J. Mach. Learn. Res. 11, 3137–3181 (2010)MathSciNetzbMATHGoogle Scholar
  2. Ijspeert, A.J., Nakanishi, J., Hoffmann, H., Pastor, P., Schaal, S.: Dynamical movement primitives: learning attractor models for motor behaviors. Neural Comput. 25(2), 328–373 (2013)MathSciNetCrossRefGoogle Scholar
  3. Cai, J., Huang, X.: Robust kernel canonical correlation analysis with applications to information retrieval. Eng. Appl. Artif. Intell. 64, 33–42 (2017)CrossRefGoogle Scholar
  4. Cai, J., Tang, Y., Wang, J.: Kernel canonical correlation analysis via gradient descent. Neurocomputing 182, 322–331 (2016)CrossRefGoogle Scholar
  5. Daniel, C., Neumann, G., Kroemer, O., Peters, J.: Hierarchical relative entropy policy search. J. Mach. Learn. Res. 17, 3190–3239 (2016)MathSciNetzbMATHGoogle Scholar
  6. Gregory, M.D., Martin, S.V., Werner, D.H.: Improved electromagnetics optimization: the covariance matrix adaptation evolutionary strategy. IEEE Antennas Propag. Mag. 57(3), 48–59 (2015)CrossRefGoogle Scholar
  7. Kober, J., Peters, J.: Policy search for motor primitives in robotics. Mach. Learn. 84(1–2), 171–203 (2011)MathSciNetCrossRefGoogle Scholar
  8. Lai, P.L., Fyfe, C.: Kernel and nonlinear canonical correlation analysis. In: IEEE/INNS/ENNS International Joint Conference on Neural Networks (IJCNN 2000), Como, Italy (2000)Google Scholar
  9. Lions, P.L.: Optimal-control of diffusion-processes and Hamilton-Jacobi-Bellman equations, 1. Commun. Part.L Differ Eqn. 8(10), 1101–1174 (1983). The dynamic-programming principle and applicationsCrossRefGoogle Scholar
  10. Liu, J., Qi, Y., Meng, Z., Fu, L.: Self-learning Monte Carlo method. Phys. Rev. B 95(4), 041101 (2017)CrossRefGoogle Scholar
  11. Melzer, T., Reiter, M., Bischof, H.: Appearance models based on Kernel canonical correlation analysis. Pattern Recognit.: J. Pattern Recognit. Soc. 36(9), 1961–1971 (2003)CrossRefGoogle Scholar
  12. Smola, A.J.: Learning with Kernels—support vector machines. Lect. Notes Comput. Sci. 42(4), 1–28 (2008)Google Scholar
  13. Stulp, F., Sigaud, O.: Path integral policy improvement with covariance matrix adaptation. Comput. Sci. (2012) Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of AutomationWuhan University of TechnologyWuhanChina
  2. 2.School of ComputingUniversity of PortsmouthPortsmouth, LondonUK

Personalised recommendations