Abstract
In this chapter, we will take the idea of the policy-gradient-based REINFORCE with baseline algorithm further and combine that idea with the value-estimation ideas from the DQN, thus, bringing the best of both worlds together in the form of the Actor-Critic algorithm. We will further discuss the “advantage” baseline implementation of the model with deep learning-based approximators, and take the concept further to implement a parallel implementation of the deep learning-based advantage actor-critic algorithm in the synchronous (A2C) and the asynchronous (A3C) modes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Sewak, M. (2019). Actor-Critic Models and the A3C. In: Deep Reinforcement Learning. Springer, Singapore. https://doi.org/10.1007/978-981-13-8285-7_11
Download citation
DOI: https://doi.org/10.1007/978-981-13-8285-7_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8284-0
Online ISBN: 978-981-13-8285-7
eBook Packages: Computer ScienceComputer Science (R0)