Reinforcement Learning for Inventory Management

  • Shraddha Bharti
  • Dony S. Kurian
  • V. Madhusudanan PillaiEmail author
Conference paper
Part of the Lecture Notes in Mechanical Engineering book series (LNME)


The decision of “how much to order” at each stage of the supply chain is a major task to minimize inventory costs. Managers tend to follow particular ordering policy seeking individual benefit which hampers the overall performance of the supply chain. Major findings from the literature show that, with the advent of machine learning and artificial intelligence, the trend in this area has been heading from simple base stock policy to intelligence-based learning algorithms to gain near-optimal solution. This paper initially focuses on formulating a multi-agent four-stage serial supply chain as reinforcement learning (RL) model for ordering management problem. In the final step, RL model for a single-agent supply chain is optimized using Q-learning algorithm. The results from the simulations show that the RL model with Q-learning algorithm is found to be better than Order-Up-To policy and 1–1 policy.


Supply chain Ordering policy Inventory management Reinforcement learning Q-learning 


  1. 1.
    Lee HL, Padmanabhan V, Whang S (1997) Information distortion in a supply chain: the bullwhip effect. Manag Sci 43(4):546–558CrossRefGoogle Scholar
  2. 2.
    Sterman JD (1989) Modeling managerial behavior: misperceptions of feedback in a dynamic decision making experiment. Manag Sci 35(3):321–339CrossRefGoogle Scholar
  3. 3.
    Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the fifteenth national conference on artificial intelligence. AAAI, Madison, Wisconsin, pp 746–752Google Scholar
  4. 4.
    Forester JW (1961) Industrial dynamics, 1st edn. MIT Press; Wiley, New YorkGoogle Scholar
  5. 5.
    Chaharsooghi SK, Heydari J, Zegordi SH (2008) A reinforcement learning model for supply chain ordering management: an application to the beer game. Decis Support Syst 45(4):949–959CrossRefGoogle Scholar
  6. 6.
    Clark AJ, Scarf H (1960) Optimal policies for a multi-echelon inventory problem. Manag Sci 6(4):475–490CrossRefGoogle Scholar
  7. 7.
    Kimbrough SO, Wu DJ, Zhong F (2002) Computers play the beer game: can artificial agents manage supply chains? Decis Support Syst 33(3):323–333CrossRefGoogle Scholar
  8. 8.
    Mosekilde E, Larsen ER (1986) Deterministic chaos in the beer production-distribution model. Syst Dyn Rev 4(1–2):131–147Google Scholar
  9. 9.
    Strozzi F, Bosch J, Zaldivar JM (2007) Beer game order policy optimization under changing customer demand. Decis Support Syst 42(4):2153–2163CrossRefGoogle Scholar
  10. 10.
    Edali M, Yasarcan H (2016) Results of a beer game experiment: should a manager always behave according to the book? Complexity 21(S1):190–199MathSciNetCrossRefGoogle Scholar
  11. 11.
    Gosavi A (2009) Reinforcement learning: a tutorial survey and recent advances. INFORMS J Comput 21(2):178–192MathSciNetCrossRefGoogle Scholar
  12. 12.
    Pontrandolfo P, Gosavi A, Okogbaa OG, Das TK (2002) Global supply chain management: a reinforcement learning approach. Int J Prod Res 40(6):1299–1317CrossRefGoogle Scholar
  13. 13.
    Giannoccaro I, Pontrandolfo P (2002) Inventory management in supply chains: a reinforcement learning approach. Int J Prod Econ 78(2):153–161CrossRefGoogle Scholar
  14. 14.
    Kara A, Dogan I (2017) Reinforcement learning approaches for specifying ordering policies of perishable inventory systems. Expert Syst Appl 91:150CrossRefGoogle Scholar
  15. 15.
    Oroojlooyjadid A, Nazari M, Snyder L, Takáč M (2017) A deep Q-network for the beer game: a reinforcement learning algorithm to solve inventory optimization problems. arXiv preprint arXiv:1708.05924 [cs. LG]
  16. 16.
    Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, 1st edn. MIT Press, CambridgezbMATHGoogle Scholar
  17. 17.
    Puterman ML (1994) Markov decision processes: Discrete stochastic dynamic programming. Wiley, New YorkCrossRefGoogle Scholar
  18. 18.
    Daniel JSR, Rajendran C (2005) A simulation-based genetic algorithm for inventory optimization in a serial supply chain. Int Trans Oper Res 12(1):101–127CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Shraddha Bharti
    • 1
  • Dony S. Kurian
    • 1
  • V. Madhusudanan Pillai
    • 1
    Email author
  1. 1.Department of Mechanical EngineeringNational Institute of Technology CalicutKozhikodeIndia

Personalised recommendations