Markov decision processes (MDPs), also called stochastic dynamic programming, have been studied extensively since they were first introduced in 1960 [55]. MDPs were mainly used to model and solve dynamic decision-making problems with multi-periods under stochastic circumstances.


