Abstract
Pong game was the titanic of the gaming industry in 20th century. Pong is the perfect example of deep reinforcement learning of ATARI game [1]. The game is extremely beneficial to improve concentration and memory capacity. Since the game is played by around 350 million people worldwide at present scenario, hence we saw the opportunity in this interesting game. The project has a great scope in atari game development. We proposed a stochastic reinforcement learning technique of Policy Gradient algorithm to optimize Pong game. The purpose of this study is to improve the algorithms that control the game structure, mechanism and real-time dynamics. We implemented policy gradient algorithm to improve the performance and training which is significantly better than traditional genetic algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Mnih, V., et al.: Playing atari with deep reinforcement learning, In: NIPS Workshop (2013)
Ghory, I.: Reinforcement learning in board games (2013)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjel, A.K., Ostrovski, G.: Human-level control through deep reinforcement learning (2015)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning (2013)
Silver, D.: Deterministic policy gradient algorithms. In: ICML (2015)
Fu, J., Hsu, I.: Model-Based Reinforcement Learning for Playing Atari games (2014)
Hartley, T., Mehdi, Q., Gough, N.: Applying Markov Decision Process to 2-D Real Games (2001)
Jaakkola, T., Singh, S.P., Jordan, M.I.: Reinforcement learning algorithms for partially observable Markov decision problems. In: NIPS 7, pp. 345–352. Morgan Kaufman (1995)
Marbach, P., Tsitsiklis, J.N.: Simulation-based optimization of Markov reward processes, Technical report LIDS-P-2411, Massachusetts Institute of Technology (1998)
Bertsekas, D.P.: Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs (1987)
Watkins, C.J.C.H., Dayan, P.: Technical note: Q-learning. Mach. Learn. 8(3/4), 279–292 (1992)
Schwartz, A.: A reinforcement learning method for maximizing undiscounted rewards. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 298–305. Amherst (1993)
White, D.A., Sofge, D.A.: Neural network based process optimization and control. In: Proceedings of the 29th Conference on Decision and Control, Honolulu, Hawaii, pp. 3270–3276 (1990)
Ok, D.: A Comparative Study of Undiscounted and Discounted Reinforcement Learning Methods (1994)
Fachantidis, A., Taylor, M.E., Vlahavas, I.: Learning to teach reinforcement learning agents (2017)
Kobayashi, M., Zamani, A., Ozawa, S., Abe, S.: Reducing computations in incremental learning for feedforward neural network with long-term memory. In: Proceedings of International Joint Conference on Neural Networks, 1989/1994 (2001)
Das, T.K., Gosavi, A., Mahadevan, S., Marchalleck, N.: Solving semi-Markov decision problems using average reward reinforcement learning. Manag. Sci. 45(4), 560574 (1999)
Mannor, S., Tsitsiklis, J.N.: Mean-Variance Optimization in Markov Decision Processes (1998)
Wang, J., Li, B., Liu, C.: Research of New Learning Method of Feedforward Neural Network (2003)
Seuret, M., Alberti, M., Ingold, R., Liwicki, M.: PCA-Initialized Deep Neural Networks Applied To Document Image Analysis (2007)
Saruchi, S.: Adaptive sigmoid function to enhance low contrast images (2016)
Minai, A.A., Williams, R.D.: On the derivatives of the sigmoid (2015)
Schmidt, W.F., Kraaijveld, M.A., Duin, R.P.W.: Feed forward neural networks with random weight (2016)
Tang, Y., Salakhutdinov, R.: Learning stochastic feedforward neural networks (2016)
Bottou, L.: Large-scale machine learning with stochastic gradient descent (2012)
Stevens, M., Pradhan, S.: Playing Tetris with Deep Reinforcement Learning (2014)
Abadi, M., Barham, P.: TensorFlow: a system for large-scale machine learning (2012)
Tsitsiklis, J.N., Van Roy, B.: On average versus discounted rewardtemporal-difference learning (2013)
Tadepalli, P., Ok, D.: Model-based average reward reinforcement learning (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
- ReLU-:
-
Rectified linear units,
- DQN-:
-
Deep Q-Networks
- RL-:
-
Reinforcement Learning
- st-:
-
State at time t
- at-:
-
Action at time t
- rt-:
-
Reward earned
- π-:
-
Policy function
- γ-:
-
Discount factor determining the agent’s horizon
- θi-:
-
The parameters of the Q-network at iteration i
- E-:
-
Expected return
- i-:
-
Iterations
- Li(θi)-:
-
Loss function
- αh-:
-
Learning rate
- h-:
-
Current update number [6].
- ak-:
-
Time-step dependent weighting factors
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Singh, A., Gupta, V. (2018). Pong Game Optimization Using Policy Gradient Algorithm. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds) Smart and Innovative Trends in Next Generation Computing Technologies. NGCT 2017. Communications in Computer and Information Science, vol 827. Springer, Singapore. https://doi.org/10.1007/978-981-10-8657-1_40
Download citation
DOI: https://doi.org/10.1007/978-981-10-8657-1_40
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8656-4
Online ISBN: 978-981-10-8657-1
eBook Packages: Computer ScienceComputer Science (R0)