Pong Game Optimization Using Policy Gradient Algorithm

Singh, Aditya; Gupta, Vishal

doi:10.1007/978-981-10-8657-1_40

Pong Game Optimization Using Policy Gradient Algorithm

Conference paper
First Online: 09 June 2018

1375 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 827))

Abstract

Pong game was the titanic of the gaming industry in 20th century. Pong is the perfect example of deep reinforcement learning of ATARI game [1]. The game is extremely beneficial to improve concentration and memory capacity. Since the game is played by around 350 million people worldwide at present scenario, hence we saw the opportunity in this interesting game. The project has a great scope in atari game development. We proposed a stochastic reinforcement learning technique of Policy Gradient algorithm to optimize Pong game. The purpose of this study is to improve the algorithms that control the game structure, mechanism and real-time dynamics. We implemented policy gradient algorithm to improve the performance and training which is significantly better than traditional genetic algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Mnih, V., et al.: Playing atari with deep reinforcement learning, In: NIPS Workshop (2013)
Google Scholar
Ghory, I.: Reinforcement learning in board games (2013)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjel, A.K., Ostrovski, G.: Human-level control through deep reinforcement learning (2015)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning (2013)
Google Scholar
Silver, D.: Deterministic policy gradient algorithms. In: ICML (2015)
Google Scholar
Fu, J., Hsu, I.: Model-Based Reinforcement Learning for Playing Atari games (2014)
Google Scholar
Hartley, T., Mehdi, Q., Gough, N.: Applying Markov Decision Process to 2-D Real Games (2001)
Google Scholar
Jaakkola, T., Singh, S.P., Jordan, M.I.: Reinforcement learning algorithms for partially observable Markov decision problems. In: NIPS 7, pp. 345–352. Morgan Kaufman (1995)
Google Scholar
Marbach, P., Tsitsiklis, J.N.: Simulation-based optimization of Markov reward processes, Technical report LIDS-P-2411, Massachusetts Institute of Technology (1998)
Google Scholar
Bertsekas, D.P.: Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs (1987)
MATH Google Scholar
Watkins, C.J.C.H., Dayan, P.: Technical note: Q-learning. Mach. Learn. 8(3/4), 279–292 (1992)
Article Google Scholar
Schwartz, A.: A reinforcement learning method for maximizing undiscounted rewards. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 298–305. Amherst (1993)
Chapter Google Scholar
White, D.A., Sofge, D.A.: Neural network based process optimization and control. In: Proceedings of the 29th Conference on Decision and Control, Honolulu, Hawaii, pp. 3270–3276 (1990)
Google Scholar
Ok, D.: A Comparative Study of Undiscounted and Discounted Reinforcement Learning Methods (1994)
Google Scholar
Fachantidis, A., Taylor, M.E., Vlahavas, I.: Learning to teach reinforcement learning agents (2017)
Google Scholar
Kobayashi, M., Zamani, A., Ozawa, S., Abe, S.: Reducing computations in incremental learning for feedforward neural network with long-term memory. In: Proceedings of International Joint Conference on Neural Networks, 1989/1994 (2001)
Google Scholar
Das, T.K., Gosavi, A., Mahadevan, S., Marchalleck, N.: Solving semi-Markov decision problems using average reward reinforcement learning. Manag. Sci. 45(4), 560574 (1999)
Article Google Scholar
Mannor, S., Tsitsiklis, J.N.: Mean-Variance Optimization in Markov Decision Processes (1998)
Google Scholar
Wang, J., Li, B., Liu, C.: Research of New Learning Method of Feedforward Neural Network (2003)
Google Scholar
Seuret, M., Alberti, M., Ingold, R., Liwicki, M.: PCA-Initialized Deep Neural Networks Applied To Document Image Analysis (2007)
Google Scholar
Saruchi, S.: Adaptive sigmoid function to enhance low contrast images (2016)
Google Scholar
Minai, A.A., Williams, R.D.: On the derivatives of the sigmoid (2015)
Google Scholar
Schmidt, W.F., Kraaijveld, M.A., Duin, R.P.W.: Feed forward neural networks with random weight (2016)
Google Scholar
Tang, Y., Salakhutdinov, R.: Learning stochastic feedforward neural networks (2016)
Google Scholar
Bottou, L.: Large-scale machine learning with stochastic gradient descent (2012)
Google Scholar
Stevens, M., Pradhan, S.: Playing Tetris with Deep Reinforcement Learning (2014)
Google Scholar
Abadi, M., Barham, P.: TensorFlow: a system for large-scale machine learning (2012)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: On average versus discounted rewardtemporal-difference learning (2013)
Google Scholar
Tadepalli, P., Ok, D.: Model-based average reward reinforcement learning (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

BML Munjal University, National Highway 8 67 Km Milestone, Gurgaon, 122413, Haryana, India
Aditya Singh & Vishal Gupta

Authors

Aditya Singh
View author publications
You can also search for this author in PubMed Google Scholar
Vishal Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditya Singh .

Editor information

Editors and Affiliations

Indian Institute of Technology Patna, Patna, Bihar, India
Pushpak Bhattacharyya
University of Petroleum and Energy Studies, Dehradun, India
Hanumat G. Sastry
University of Petroleum and Energy Studies, Dehradun, India
Venkatadri Marriboyina
University of Petroleum and Energy Studies, Dehradun, India
Rashmi Sharma

Appendix

ReLU-:: Rectified linear units,
DQN-:: Deep Q-Networks
RL-:: Reinforcement Learning
s_t-:: State at time t
a_t-:: Action at time t
r_t-:: Reward earned
π-:: Policy function
γ-:: Discount factor determining the agent’s horizon
θ_i-:: The parameters of the Q-network at iteration i
E-:: Expected return
i-:: Iterations
Li(θi)-:: Loss function
α_h-:: Learning rate
h-:: Current update number [6].
a_k-:: Time-step dependent weighting factors

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, A., Gupta, V. (2018). Pong Game Optimization Using Policy Gradient Algorithm. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds) Smart and Innovative Trends in Next Generation Computing Technologies. NGCT 2017. Communications in Computer and Information Science, vol 827. Springer, Singapore. https://doi.org/10.1007/978-981-10-8657-1_40

Download citation

DOI: https://doi.org/10.1007/978-981-10-8657-1_40
Published: 09 June 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8656-4
Online ISBN: 978-981-10-8657-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation