Skip to main content

A Cultural Algorithm for POMDPs from Stochastic Inventory Control

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5296))

Abstract

Reinforcement Learning algorithms such as SARSA with an eligibility trace, and Evolutionary Computation methods such as genetic algorithms, are competing approaches to solving Partially Observable Markov Decision Processes (POMDPs) which occur in many fields of Artificial Intelligence. A powerful form of evolutionary algorithm that has not previously been applied to POMDPs is the cultural algorithm, in which evolving agents share knowledge in a belief space that is used to guide their evolution. We describe a cultural algorithm for POMDPs that hybridises SARSA with a noisy genetic algorithm, and inherits the latter’s convergence properties. Its belief space is a common set of state-action values that are updated during genetic exploration, and conversely used to modify chromosomes. We use it to solve problems from stochastic inventory control by finding memoryless policies for nondeterministic POMDPs. Neither SARSA nor the genetic algorithm dominates the other on these problems, but the cultural algorithm outperforms the genetic algorithm, and on highly non-Markovian instances also outperforms SARSA.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arnold, D.V., Beyer, H.-G.: Local Performance of the (1+1)-ES in a Noisy Environment. IEEE Trans. Evolutionary Computation 6(1), 30–41 (2002)

    Article  Google Scholar 

  2. Becerra, R.L., Coello, C.A.C.: A Cultural Algorithm with Differential Evolution to Solve Constrained Optimization Problems. In: Lemaître, C., Reyes, C.A., González, J.A. (eds.) IBERAMIA 2004. LNCS (LNAI), vol. 3315, pp. 881–890. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. de Croon, G., van Dartel, M.F., Postma, E.O.: Evolutionary Learning Outperforms Reinforcement Learning on Non-Markovian Tasks. In: Workshop on Memory and Learning Mechanisms in Autonomous Robots, 8th European Conference on Artificial Life, Canterbury, Kent, UK (2005)

    Google Scholar 

  4. Fitzpatrick, J.M., Grefenstette, J.J.: Genetic Algorithms in Noisy Environments. Machine Learning 3, 101–120 (1988)

    Google Scholar 

  5. Gao, F., Cui, G., Liu, H.: Integration of Genetic Algorithm and Cultural Algorithms for Constrained Optimization. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4234, pp. 817–825. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Gopalakrishnan, G., Minsker, B.S., Goldberg, D.: Optimal Sampling in a Noisy Genetic Algorithm for Risk-Based Remediation Design. In: World Water and Environmental Resources Congress. ASCE (2001)

    Google Scholar 

  7. Heisig, G.: Comparison of (s,S) and (s,nQ) Inventory Control Rules with Respect to Planning Stability. International Journal of Production Economics 73, 59–82 (2001)

    Article  Google Scholar 

  8. Holland, J.H.: Adaptation. In: Progress in Theoretical Biology IV, pp. 263–293. Academic Press, London (1976)

    Chapter  Google Scholar 

  9. Iglesias, R., Rodriguez, M., Sánchez, M., Pereira, E., Regueiro, C.V.: Improving Reinforcement Learning Through a Better Exploration Strategy and an Adjustable Representation of the Environment. In: 3rd European Conference on Mobile Robots (2007)

    Google Scholar 

  10. Jaakkola, T., Singh, S.P., Jordan, M.I.: Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems. In: Advances in Neural Information Processing Systems 6. MIT Press, Cambridge (1994)

    Google Scholar 

  11. Kovacs, T., Reynolds, S.I.: A Proposal for Population-Based Reinforcement Learning. Technical report CSTR-03-001, Department of Computer Science, University of Bristol (2003)

    Google Scholar 

  12. Littman, M.: Memoryless Policies: Theoretical Limitations and Practical Results. In: 3rd Conference on Simulation of Adaptive Behavior (1994)

    Google Scholar 

  13. Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning Policies for Partially Observable Environments: Scaling Up. In: International Conference on Machine Learning (1995)

    Google Scholar 

  14. Littman, M., Dean, T., Kaelbling, L.: On the Complexity of Solving Markov Decision Problems. In: 11th Conference on Uncertainty in Artificial Intelligence, pp. 394–402 (1995)

    Google Scholar 

  15. Liu, H., Hong, B., Shi, D., Ng, G.S.: On Partially Observable Markov Decision Processes Using Genetic Algorithm Based Q-Learning. In: Advances in Neural Networks, pp. 248–252. Watam Press (2007)

    Google Scholar 

  16. Loch, J., Singh, S.P.: Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes. In: 15th International Conference on Machine Learning, pp. 323–331 (1998)

    Google Scholar 

  17. Miller, B.L.: Noise, Sampling, and Efficient Genetic Algorithms. PhD thesis, University of Illinois, Urbana-Champaign (1997)

    Google Scholar 

  18. Miller, B.L., Goldberg, D.E.: Optimal Sampling for Genetic Algorithms. In: Intelligent Engineering Systems Through Artificial Neural Networks, vol. 6, pp. 291–298. ASME Press (1996)

    Google Scholar 

  19. Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary Algorithms for Reinforcement Learning. Journal of Artificial Intelligence Research 11, 241–276 (1999)

    MATH  Google Scholar 

  20. Penrith, M.D., McGarity, M.J.: An Analysis of Direct Reinforcement Learning in non-Markovian Domains. In: 15th International Conference on Machine Learning, pp. 421–429. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  21. Reynolds, R.G.: An Introduction to Cultural Algorithms. In: 3rd Annual Conference on Evolutionary Programming, pp. 131–139. World Scientific Publishing, Singapore (1994)

    Google Scholar 

  22. Reynolds, R.G.: Cultural Algorithms: Theory and Applications. New Ideas in Optimization, pp. 367–377. McGraw-Hill, New York (1999)

    Google Scholar 

  23. Reynolds, R.G., Chung, C.: A Cultural Algorithm Framework to Evolve Multiagent Cooperation With Evolutionary Programming. In: Angeline, P.J., McDonnell, J.R., Reynolds, R.G., Eberhart, R. (eds.) EP 1997. LNCS, vol. 1213, pp. 323–333. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  24. Rivera, D.C., Becerra, R.L., Coello, C.A.C.: Cultural Algorithms, an Alternative Heuristic to Solve the Job Shop Scheduling Problem. Engineering Optimization 39(1), 69–85 (2007)

    Article  MathSciNet  Google Scholar 

  25. Rummery, G.A., Niranjan, M.: On-line Q-learning Using Connectionist Systems. Technical report CUED/F-INFENG/TR 166, Cambridge University (1994)

    Google Scholar 

  26. Russell, S.J., Zimdars, A.: Q-Decomposition for Reinforcement Learning Agents. In: 20th International Conference on Machine Learning, pp. 656–663. AAAI Press, Menlo Park (2003)

    Google Scholar 

  27. Silver, E.A., Pyke, D.F., Peterson, R.: Inventory Management and Production Planning and Scheduling. John-Wiley and Sons, New York (1998)

    Google Scholar 

  28. Singh, S., Jaakkola, T., Jordan, M.: Learning Without State-Estimation in Partially Observable Markovian Decision Processes. In: 11th International Conference on Machine Learning, pp. 284–292. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  29. Stroud, P.D.: Kalman-Extended Genetic Algorithm for Search in Nonstationary Environments with Noisy Fitness Functions. IEEE Transactions on Evolutionary Computation 5(1), 66–77 (2001)

    Article  Google Scholar 

  30. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  31. Treharne, J.T., Sox, C.R.: Adaptive Inventory Control for Nonstationary Demand and Partial Information. Management Science 48(5), 607–624 (2002)

    Article  MATH  Google Scholar 

  32. Watkins, C.J.C.H.: Learning From Delayed Rewards. PhD thesis, Cambridge University (1989)

    Google Scholar 

  33. Whitley, D., Kauth, J.: GENITOR: A Different Genetic Algorithm. In: Rocky Mountain Conference on Artificial Intelligence, Denver, CO, USA, pp. 118–130 (1988)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Prestwich, S.D., Tarim, S.A., Rossi, R., Hnich, B. (2008). A Cultural Algorithm for POMDPs from Stochastic Inventory Control. In: Blesa, M.J., et al. Hybrid Metaheuristics. HM 2008. Lecture Notes in Computer Science, vol 5296. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88439-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88439-2_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88438-5

  • Online ISBN: 978-3-540-88439-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics