Reinforcement Learning for Scheduling of Maintenance

  • Michael Knowles
  • David Baglee
  • Stefan Wermter
Conference paper


Improving maintenance scheduling has become an area of crucial importance in recent years. Condition-based maintenance (CBM) has started to move away from scheduled maintenance by providing an indication of the likelihood of failure. Improving the timing of maintenance based on this information to maintain high reliability without resorting to over-maintenance remains, however, a problem. In this paper we propose Reinforcement Learning (RL), to improve long term reward for a multistage decision based on feedback given either during or at the end of a sequence of actions, as a potential solution to this problem. Several indicative scenarios are presented and simulated experiments illustrate the performance of RL in this application.


Reinforcement Learning Optimal Schedule Digital Object Identifier Reliability Function Maintenance Schedule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Grall A., Berenguer C., Dieulle L.: A condition-based maintenance policy for stochastically deteriorating systems. Reliability Engineering & System Safety, Volume 76, Issue 2, Pages 167-180, ISSN 0951-8320, DOI: 10.1016/S0951-8320(01)00148-X.(2002)CrossRefGoogle Scholar
  2. 2.
    Bengtsson M.: Standardization Issues in Condition Based Maintenance. In Condition Monitoring and Diagnostic Engineering Management - Proceedings of the 16th International Congress, August 27-29, 2003, Växjö University, Sweden, Edited by Shrivastav, O. and Al-Najjar, B., Växjö University Press, ISBN 91-7636-376-7. (2003)Google Scholar
  3. 3.
    Davies A. (Ed): Handbook of Condition Monitoring - Techniques and Methodology. Springer, 1998 978-0-412-61320-3.(1997)Google Scholar
  4. 4.
    Barron R. (Ed): Engineering Condition Monitoring: Practice, Methods and Applications. Longman, 1996, 978-0582246560.(1996)Google Scholar
  5. 5.
    Wang W.: A model to determine the optimal critical level and the monitoring intervals in condition-based maintenance. International Journal of Production Research, volume 38 No 6 pp 1425–1436. (2000)MATHCrossRefGoogle Scholar
  6. 6.
    Meier A.: Is that old refrigerator worth saving? Home Energy Magazine
  7. 7.
    Litt B., Megowen A. and Meier A.: Maintenance doesn’t necessarily lower energy use. Home Energy Magazine (1993)
  8. 8.
    Techato K-A, Watts D.J. and Chaiprapat S.: Life cycle analysis of retrofitting with high energy efficiency air-conditioner and fluorescent lamp in existing buildings. Energy Policy, Vol. 37, pp 318 – 325. (2009)CrossRefGoogle Scholar
  9. 9.
    Boardman B., Lane K., Hinnells M., Banks N., Milne G., Goodwin A. and Fawcett T.: Transforming the UK Cold Market Domestic Equipment and Carbon Dioxide Emissions (DECADE) Report. (1997)Google Scholar
  10. 10.
    Knowles M.J. and Baglee D.:The Role of Maintenance in Energy Saving, 19th MIRCE International Symposium on Engineering and Managing Sustainability - A Reliability, Maintainability and Supportability Perspective, (2009)Google Scholar
  11. 11.
    Singh, S. Litman, D., Kearns M ., and Walker, M. Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System. In Journal of Artificial Intelligence Research (JAIR),Volume 16, pp. 105-133. (2002)Google Scholar
  12. 12.
    Altahhan A., Burn K. Wermter S.: Visual Robot Homing using Sarsa(λ), Whole Image Measure, and Radial Basis Function. Proceedings IEEE IJCNN (2008)Google Scholar
  13. 13.
    Altahhan A.: Conjugate Temporal Difference Methods For Visual Robot Homing. PhD Thesis, University of Sunderland. (2009)Google Scholar
  14. 14.
    Lazaric, A., M. Restelli, Bonarini A.: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods. Twenty First Annual Conference on Neural Information Processing Systems – NIPS. (2007)Google Scholar
  15. 15.
    Sheynikhovich, D., Chavarriaga R., Strosslin T. and Gerstner W.: Spatial Representation and Navigation in a Bio-inspired Robot. Biomimetic Neural Learning for Intelligent Robots. S. Wermter, M. Elshaw and G.Palm, Springer: 245-265. (2005)Google Scholar
  16. 16.
    Asadpour, M. and Siegwart, R.: Compact Q-learning optimized for micro-robots with processing and memory constrains. Robotics and Autonomous Systems, Science Direct, Elsevier. (2004)Google Scholar
  17. 17.
    Knowles, M.J. and Wermter, S.: The Hybrid Integration of Perceptual Symbol Systems and Interactive Reinforcement Learning. 8th International Conference on Hybrid Intelligent Systems. Barcelona, Spain, September 10-12th, (2008)Google Scholar
  18. 18.
    Muse, D. and Wermter, S.: Actor-Critic Learning for Platform-Independent Robot Navigation. Cognitive Computation, Volume 1, Springer New York, pp. 203-220, (2009)Google Scholar
  19. 19.
    Weber, C., Elshaw, M., Wermter, S., Triesch J. and Willmot, C.: Reinforcement Learning Embedded in Brains and Robots, In: Weber, C., Elshaw M., and Mayer N. M. (Eds.) Reinforcement Learning: Theory and Applications. pp. 119-142, I-Tech Education and Publishing, Vienna, Austria. (2008)Google Scholar
  20. 20.
    Stone, P., Sutton R. S. and Kuhlmann G.: Reinforcement learning for robocup soccer keepaway. International Society for Adaptive Behavior 13(3): 165–188 (2005)CrossRefGoogle Scholar
  21. 21.
    Taylor M.E. and Stone P.: Towards reinforcement learning representation transfer. In The Autonomous Agents and Multi-Agent Systems Conference (AAMAS-07), Honolulu, Hawaii. (2007)Google Scholar
  22. 22.
    Kalyanakrishnan S., Liu Y. and Stone P.: Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study. Lecture Notes In Computer Science, Springer (2007)Google Scholar
  23. 23.
    Lokuge, P. and Alahakoon, D.: Reinforcement learning in neuro BDI agents for achieving agent's intentions in vessel berthing applications 19th International Conference on Advanced Information Networking and Applications, 2005. AINA 2005. Volume: 1 Digital Object Identifier: 10.1109/AINA.2005.293, Page(s): 681 - 686 vol.1(2005)Google Scholar
  24. 24.
    Cong Shi, Shicong Meng, Yuanjie Liu, Dingyi Han and Yong Yu: Reinforcement Learning for Query-Oriented Routing Indices in Unstructured Peer-to-Peer Networks, Sixth IEEE International Conference on Peer-to-Peer Computing P2P 2006, Digital Object Identifier: 10.1109/P2P.2006.30, Page(s): 267 - 274 (2006)Google Scholar
  25. 25.
    Cong Shi, Shicong Meng, Yuanjie Liu, Dingyi Han and Yong Yu: Reinforcement Learning for Query-Oriented Routing Indices in Unstructured Peer-to-Peer Networks, Sixth IEEE International Conference on Peer-to-Peer Computing, 2006. P2P 2006.Digital Object Identifier: 10.1109/P2P.2006, Page(s): 267 - 274 (2006).Google Scholar
  26. 26.
    Mattila, V.: Flight time allocation for a fleet of aircraft through reinforcement learning. Simulation Conference, 2007 Winter, Digital Object Identifier: 10.1109/WSC.2007.4419888 Page(s): 2373 - 2373 (2007)Google Scholar
  27. 27.
    Zhang, Y. and Fromherz, M.: Constrained flooding: a robust and efficient routing framework for wireless sensor networks, 20th International Conference on Advanced Information Networking and Applications, 2006. AINA 2006.Volume: 1 Digital Object Identifier: 10.1109/AINA.2006.132 (2006)Google Scholar
  28. 28.
    Chasparis, G.C. and Shamma, J.S.: Efficient network formation by distributed reinforcement 47th IEEE Conference on Decision and Control, 2008. CDC 2008. Digital Object Identifier: 10.1109/CDC.2008.4739163, Page(s): 1690 - 1695 (2008).Google Scholar
  29. 29.
    Usynin, A., Hines, J.W. and Urmanov, A.: Prognostics-Driven Optimal Control for Equipment Performing in Uncertain Environment Aerospace Conference, 2008 IEEE Digital Object Identifier: 10.1109/AERO.2008.4526626, Page(s): 1 – 9 (2008)Google Scholar
  30. 30.
    Lihu, A.and Holban, S.: Top five most promising algorithms in scheduling. 5th International Symposium on Applied Computational Intelligence and Informatics, 2009. SACI '09. Digital Object Identifier: 10.1109/SACI.2009.5136281, Page(s): 397 - 404 (2009).Google Scholar
  31. 31.
    Zhang Huiliang and Huang Shell Ying: BDIE architecture for rational agents.. International Conference on Integration of Knowledge Intensive Multi-Agent Systems, Page(s): 623 – 628 (2005)Google Scholar
  32. 32.
    Malhotra, R., Blasch, E.P. and Johnson, J.D.: Learning sensor-detection policies ., Proceedings of the IEEE 1997 National Aerospace and Electronics Conference, 1997. NAECON 1997Volume: 2 Digital Object Identifier: 10.1109/NAECON.1997.622727 , Page(s): 769 - 776 vol.2 (1997)Google Scholar
  33. 33.
    Sutton, R.S. and Barto, A.G.: Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks Volume: 9 , Issue: 5 Digital Object Identifier: 10.1109/TNN.1998.712192, Page(s): 1054 - 1054 (1998)Google Scholar
  34. 34.
    Barto, A.G.: Reinforcement learning in the real world 2004. Proceedings. 2004 IEEE International Joint Conference on Neural Networks, Volume: 3 (2004)Google Scholar
  35. 35.
    Barto, A.G. and Dietterich, T.G.: Reinforcement Learning and Its Relationship to Supervised Learning In Si, J., Barto, A.G., Powell, W.B., and Wunsch, D., editors, Handbook of Learning and Approximate Dynamic Programming, pages 47 - 64. Wiley-IEEE Press, (2004)Google Scholar
  36. 36.
    Sutton, R.S., Barto, A.G.: and Williams, R.J.: Reinforcement learning is direct adaptive optimal control Control Systems Magazine, IEEE Volume: 12 , Issue: 2 Digital Object Identifier: 10.1109/37.126844 Publication Year: 1992 , Page(s): 19 - 22Google Scholar
  37. 37.
    Kaebling, L.P., Littman, M.L. and Moore A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, Vol 4, pp 237 – 285. (1996)Google Scholar
  38. 38.
    Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England. (1989).Google Scholar
  39. 39.
    Rummery G.A and Niranjan M.: On-line Q-Learning using connectionist Systems. Technical Report CUED/F-INFENG/TR166, Cambridge University. (1994)Google Scholar
  40. 40.
    Sutton, R.: Learning to predict by the methods of temporal differences. Machine Learning 3(1),pp 9–44. doi:10.1007/BF00115009. (1988)Google Scholar
  41. 41.
    Foster D.J., Morris, R.G.N. and Dayan, P.: A model of hippocampally dependent navigation, using the temporal learning rule. Hippocampus, Vol. 10, pp. 1-16, (2000)CrossRefGoogle Scholar
  42. 42.
    Humphrys, M.: Action Selection methods using Reinforcement Learning , PhD thesis, University of Cambridge, Computer Laboratory (1997)Google Scholar
  43. 43.
    Watkins, C.J.C.H. and Dayan, P.: Technical Note: Q-Learning, Machine Learning 8:279-292. (1992)MATHGoogle Scholar
  44. 44.
    Sutton R.S.: Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. Advances in Neural Processing Systems 8, pp 1038–1044. (1996)Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Michael Knowles
    • 1
  • David Baglee
    • 1
  • Stefan Wermter
    • 2
  1. 1.Institute for Automotive and Manufacturing Advanced Practice (AMAP), University of SunderlandSunderlandUK
  2. 2.Knowledge Technology Group, Department of InformaticsUniversity of HamburgHamburgGermany

Personalised recommendations