Temporal Difference Learning, SARSA, and Q-Learning
In this chapter, we will discuss the very important Q-Learning algorithm which is the basis of Deep Q Networks (DQN) that we will discuss in later chapters. Q-Learning serves to provide solutions for the control side of the problem in Reinforcement Learning and leaves the estimation side of the problem to the Temporal Difference Learning algorithm. Q-Learning provides the control solution in an off-policy approach. The counterpart SARSA algorithm also uses TD Learning for estimation but provides the solution to the control problem in an on-policy manner. In this chapter, we cover the important concepts of the TD Learning, SARSA, and Q-Learning. Also, since Q-Learning is an off-policy algorithm, so it uses different mechanisms for the behavior as opposed to the estimation policy. So, we will also cover the epsilon-greedy and some other similar algorithms that can help us explore the different actions in an off-policy approach.