Skip to main content

Pong Game Optimization Using Policy Gradient Algorithm

  • Conference paper
  • First Online:
  • 1375 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 827))

Abstract

Pong game was the titanic of the gaming industry in 20th century. Pong is the perfect example of deep reinforcement learning of ATARI game [1]. The game is extremely beneficial to improve concentration and memory capacity. Since the game is played by around 350 million people worldwide at present scenario, hence we saw the opportunity in this interesting game. The project has a great scope in atari game development. We proposed a stochastic reinforcement learning technique of Policy Gradient algorithm to optimize Pong game. The purpose of this study is to improve the algorithms that control the game structure, mechanism and real-time dynamics. We implemented policy gradient algorithm to improve the performance and training which is significantly better than traditional genetic algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Mnih, V., et al.: Playing atari with deep reinforcement learning, In: NIPS Workshop (2013)

    Google Scholar 

  2. Ghory, I.: Reinforcement learning in board games (2013)

    Google Scholar 

  3. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjel, A.K., Ostrovski, G.: Human-level control through deep reinforcement learning (2015)

    Google Scholar 

  4. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning (2013)

    Google Scholar 

  5. Silver, D.: Deterministic policy gradient algorithms. In: ICML (2015)

    Google Scholar 

  6. Fu, J., Hsu, I.: Model-Based Reinforcement Learning for Playing Atari games (2014)

    Google Scholar 

  7. Hartley, T., Mehdi, Q., Gough, N.: Applying Markov Decision Process to 2-D Real Games (2001)

    Google Scholar 

  8. Jaakkola, T., Singh, S.P., Jordan, M.I.: Reinforcement learning algorithms for partially observable Markov decision problems. In: NIPS 7, pp. 345–352. Morgan Kaufman (1995)

    Google Scholar 

  9. Marbach, P., Tsitsiklis, J.N.: Simulation-based optimization of Markov reward processes, Technical report LIDS-P-2411, Massachusetts Institute of Technology (1998)

    Google Scholar 

  10. Bertsekas, D.P.: Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs (1987)

    MATH  Google Scholar 

  11. Watkins, C.J.C.H., Dayan, P.: Technical note: Q-learning. Mach. Learn. 8(3/4), 279–292 (1992)

    Article  Google Scholar 

  12. Schwartz, A.: A reinforcement learning method for maximizing undiscounted rewards. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 298–305. Amherst (1993)

    Chapter  Google Scholar 

  13. White, D.A., Sofge, D.A.: Neural network based process optimization and control. In: Proceedings of the 29th Conference on Decision and Control, Honolulu, Hawaii, pp. 3270–3276 (1990)

    Google Scholar 

  14. Ok, D.: A Comparative Study of Undiscounted and Discounted Reinforcement Learning Methods (1994)

    Google Scholar 

  15. Fachantidis, A., Taylor, M.E., Vlahavas, I.: Learning to teach reinforcement learning agents (2017)

    Google Scholar 

  16. Kobayashi, M., Zamani, A., Ozawa, S., Abe, S.: Reducing computations in incremental learning for feedforward neural network with long-term memory. In: Proceedings of International Joint Conference on Neural Networks, 1989/1994 (2001)

    Google Scholar 

  17. Das, T.K., Gosavi, A., Mahadevan, S., Marchalleck, N.: Solving semi-Markov decision problems using average reward reinforcement learning. Manag. Sci. 45(4), 560574 (1999)

    Article  Google Scholar 

  18. Mannor, S., Tsitsiklis, J.N.: Mean-Variance Optimization in Markov Decision Processes (1998)

    Google Scholar 

  19. Wang, J., Li, B., Liu, C.: Research of New Learning Method of Feedforward Neural Network (2003)

    Google Scholar 

  20. Seuret, M., Alberti, M., Ingold, R., Liwicki, M.: PCA-Initialized Deep Neural Networks Applied To Document Image Analysis (2007)

    Google Scholar 

  21. Saruchi, S.: Adaptive sigmoid function to enhance low contrast images (2016)

    Google Scholar 

  22. Minai, A.A., Williams, R.D.: On the derivatives of the sigmoid (2015)

    Google Scholar 

  23. Schmidt, W.F., Kraaijveld, M.A., Duin, R.P.W.: Feed forward neural networks with random weight (2016)

    Google Scholar 

  24. Tang, Y., Salakhutdinov, R.: Learning stochastic feedforward neural networks (2016)

    Google Scholar 

  25. Bottou, L.: Large-scale machine learning with stochastic gradient descent (2012)

    Google Scholar 

  26. Stevens, M., Pradhan, S.: Playing Tetris with Deep Reinforcement Learning (2014)

    Google Scholar 

  27. Abadi, M., Barham, P.: TensorFlow: a system for large-scale machine learning (2012)

    Google Scholar 

  28. Tsitsiklis, J.N., Van Roy, B.: On average versus discounted rewardtemporal-difference learning (2013)

    Google Scholar 

  29. Tadepalli, P., Ok, D.: Model-based average reward reinforcement learning (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditya Singh .

Editor information

Editors and Affiliations

Appendix

Appendix

ReLU-:

Rectified linear units,

DQN-:

Deep Q-Networks

RL-:

Reinforcement Learning

st-:

State at time t

at-:

Action at time t

rt-:

Reward earned

π-:

Policy function

γ-:

Discount factor determining the agent’s horizon

θi-:

The parameters of the Q-network at iteration i

E-:

Expected return

i-:

Iterations

Li(θi)-:

Loss function

αh-:

Learning rate

h-:

Current update number [6].

ak-:

Time-step dependent weighting factors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, A., Gupta, V. (2018). Pong Game Optimization Using Policy Gradient Algorithm. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds) Smart and Innovative Trends in Next Generation Computing Technologies. NGCT 2017. Communications in Computer and Information Science, vol 827. Springer, Singapore. https://doi.org/10.1007/978-981-10-8657-1_40

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8657-1_40

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8656-4

  • Online ISBN: 978-981-10-8657-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics