Skip to main content

A Deterministic Actor-Critic Approach to Stochastic Reinforcements

  • Conference paper
  • First Online:
AI 2017: Advances in Artificial Intelligence (AI 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10400))

Included in the following conference series:

Abstract

Learning optimal policies under stochastic rewards presents a challenge for well-known reinforcement learning algorithms such as Q-learning. Q-learning has been shown to suffer from a positive bias that inhibits it from learning under inconsistent rewards. Actor-critic methods however do not suffer from such bias but may also fail to acquire the optimal policy under rewards of high variance. We propose the use of a reward shaping function in order to minimize the variance within stochastic rewards. By reformulating Q-learning as a deterministic actor-critic, we show that the use of such reward shaping function improves the acquisition of optimal policies under stochastic reinforcements.

Y. Okesanjo—Code at https://github.com/ev0/Dac-mdp.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anschel, O., Baram, N., Shimkin, N.: Deep reinforcement learning with average target DQN (2016). https://arxiv.org/abs/1611.01929v2. Accessed Mar 2017

  2. Baird, L.: Reinforcement learning in continuous time: advantage updating. In: IEEE International Conference on Neural Networks, pp. 2448–2453 (1994)

    Google Scholar 

  3. Bhatnagar, S., Sutton, R., Ghavamzadeh, M., Lee, M.: Incremental natural actor-critic algorithms. Neural Inform. Process. Syst. 20, 105–112 (2007)

    MATH  Google Scholar 

  4. Casino-To-Go: Rules and playing guide: Chuck-a-luck (2007). http://www.casino-to-go.co.uk/downloads/Chuck-A-Luck%20Rules%20and%20Guide.pdf. Accessed Mar 2017

  5. Chen, Y., Ghahramani, Z.: Scalable discrete sampling as a multi-armed bandit problem. In: 33rd International Conference on Machine Learning (2016)

    Google Scholar 

  6. Fox, R., Pakman, A., Tishby, N.: Taming the noise in reinforcement learning via soft updates. In: 32nd Conference on Uncertainty in AI (2016)

    Google Scholar 

  7. Harmon, M., Baird, L.: Residual advantage learning applied to a differential game. In: International Conference on Neural Networks (1996)

    Google Scholar 

  8. Harmon, M., Harmon, S.: Reinforcement learning: a tutorial (1997)

    Google Scholar 

  9. Hasselt, H.: Double q-learning. In: Neural Information Processing Systems, vol. 23 (2010)

    Google Scholar 

  10. Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI (2016)

    Google Scholar 

  11. Jaakkola, T., Jordan, M., Singh, S.: On the convergence of stochastic iterative dynamic programming algorithms. In: Neural Information Processing Systems, vol. 7 (1994)

    Google Scholar 

  12. Jang, E., Gu, S., Poole, B.: Categorical reparametrization with gumbel-softmax. In: International Conference on Learning Representations (2016)

    Google Scholar 

  13. Kavouras, I.: How to play roulette: Rules, odds & payouts. http://www.roulette30.com/2014/04/how-to-play-roulette-beginners-guide.html. Accessed Mar 2017

  14. Konda, V., Tsitsiklis, J.: Actor-critic algorithms. Neural Inform. Process. Syst. 12, 1008–1014 (1999)

    MATH  Google Scholar 

  15. Moreno, A., Martin, J., Soria, E., Magdalena, R., Martinez, M.: Noisy reinforcements in reinforcement learning: some case studies based on grid worlds. In: 6th WSEAS International Conference on Applied Computer Science (2006)

    Google Scholar 

  16. Ng, A., Harada, Y., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: 16th International Conference on Machine Learning, pp. 278–287 (1999)

    Google Scholar 

  17. Papandreou, G., Yuille, A.: Perturb-and-map random fields: using discrete optimization to learn and sample from energy models. In: International Conference on Computer Vision (2011)

    Google Scholar 

  18. Peters, J., Schaal, S.: Policy gradient methods for robotics. In: IEEE International Conference on Intelligent Robotics Systems, pp. 2219–2225 (2006)

    Google Scholar 

  19. Silver, D.: Reinforcement learning: policy gradient (2015). http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching_files/pg.pdf. Accessed Mar 2017

  20. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: 31st International Conference on Machine Learning (2014)

    Google Scholar 

  21. Singh, S., Jaakkola, T., Littman, M.L., Szepesvári, C.: Convergence results for single-step on-policy reinforcement-learning algorithms. Mach. Learn. 38, 287–308 (2000)

    Article  MATH  Google Scholar 

  22. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction, chap. 6, pp. 143–145, 2nd edn. The MIT Press, Cambridge (2016)

    Google Scholar 

  23. Sutton, R., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Neural Inform. Process. Syst. 12, 1057–1063 (2000)

    Google Scholar 

  24. Tannenbaum, P.: Mini-Excursion 4: the mathematics of managing risk. In: Excursions in Modern Mathematics, 7 edn. Pearson, London (2010)

    Google Scholar 

  25. Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 9–44 (1992)

    MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Prof. Guerzhoy for the helpful guidance and discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yemi Okesanjo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Okesanjo, Y., Kofia, V. (2017). A Deterministic Actor-Critic Approach to Stochastic Reinforcements. In: Peng, W., Alahakoon, D., Li, X. (eds) AI 2017: Advances in Artificial Intelligence. AI 2017. Lecture Notes in Computer Science(), vol 10400. Springer, Cham. https://doi.org/10.1007/978-3-319-63004-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63004-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63003-8

  • Online ISBN: 978-3-319-63004-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics