Advertisement

Reinforcement Learning Based on Extreme Learning Machine

  • Jie Pan
  • Xuesong Wang
  • Yuhu Cheng
  • Ge Cao
Part of the Communications in Computer and Information Science book series (CCIS, volume 304)

Abstract

Extreme learning machine not only has the best generalization performance but also has simple structure and convenient calculation. In this paper, its merits are used for reinforcement learning. The use of extreme learning machine on Q function approximation can improve the speed of reinforcement learning. As the number of hidden layer nodes is equal to that of samples, the larger sample size will seriously affect the learning speed. To solve this problem, a rolling time-window mechanism is introduced to the algorithm, which can reduce the size of the sample space to a certain extent. Finally, our algorithm is compared with a reinforcement learning based on a traditional BP neural network using a boat problem. Simulation results show that the proposed algorithm is faster and more effective.

Keywords

Extreme learning machine Neural network Q learning Rolling time-window Boat problem 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)Google Scholar
  2. 2.
    Abe, K.: Reinforcement Learning-Value Function Estimation and Policy Search. Society of Instrument and Control Engineers 41(9), 680–685 (2002)Google Scholar
  3. 3.
    Wang, X.S., Tian, X.L., Cheng, Y.H.: Value Approximation with Least Squares Support Vector Machine in Reinforcement Learning System. Journal of Computational and Theoretical Nanoscience 4(7/8), 1290–1294 (2007)CrossRefGoogle Scholar
  4. 4.
    Vien, N.A., Yu, H., Chung, T.C.: Hessian Matrix Distribution for Bayesian Policy Gradient Reinforcement Learning. Information Sciences 181(9), 1671–1685 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks. In: Proceedings of the International Joint Conference on Neural Networks, pp. 25–29. The MIT Press, Budapest (2004)Google Scholar
  6. 6.
    Ding, S., Su, C.Y.: Application of Optimizing Bp Neural Networks Algorithm Based on Genetic Algorithm. In: Proceedings of the 29th Chinese Control Conference, pp. 2425–2428. The MIT Press, Beijing (2010)Google Scholar
  7. 7.
    Wang, G., Li, P.: Dynamic Adaboost Ensemble Extreme Learning Machine. In: Proceedings of the International Conference on Advanced Computer Theory and Engineering, pp. 54–58. The MIT Press, Chengdu (2010)Google Scholar
  8. 8.
    Jouffe, L.: Fuzzy Inference System Learning By Reinforcement Methods. IEEE Transactions on Systems, Man and Cybernetics 28(3), 338–355 (1998)CrossRefGoogle Scholar
  9. 9.
    Thomas, A., Marcus, S.I.: Reinforcement Learning for MDPs Using Temporal Difference Schemes. In: Proceedings of the IEEE Conference on Decision and Control, pp. 577–583. The MIT Press, San Diego (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jie Pan
    • 1
  • Xuesong Wang
    • 1
  • Yuhu Cheng
    • 1
  • Ge Cao
    • 1
  1. 1.School of Information and Electrical EngineeringChina University of Mining and TechnologyXuzhouP.R. China

Personalised recommendations