A Cultural Algorithm for POMDPs from Stochastic Inventory Control

Prestwich, S. D.; Tarim, S. A.; Rossi, R.; Hnich, B.

doi:10.1007/978-3-540-88439-2_2

A Cultural Algorithm for POMDPs from Stochastic Inventory Control

S. D. Prestwich⁷,
S. A. Tarim⁸,
R. Rossi⁷ &
…
B. Hnich⁹

Conference paper

687 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5296))

Abstract

Reinforcement Learning algorithms such as SARSA with an eligibility trace, and Evolutionary Computation methods such as genetic algorithms, are competing approaches to solving Partially Observable Markov Decision Processes (POMDPs) which occur in many fields of Artificial Intelligence. A powerful form of evolutionary algorithm that has not previously been applied to POMDPs is the cultural algorithm, in which evolving agents share knowledge in a belief space that is used to guide their evolution. We describe a cultural algorithm for POMDPs that hybridises SARSA with a noisy genetic algorithm, and inherits the latter’s convergence properties. Its belief space is a common set of state-action values that are updated during genetic exploration, and conversely used to modify chromosomes. We use it to solve problems from stochastic inventory control by finding memoryless policies for nondeterministic POMDPs. Neither SARSA nor the genetic algorithm dominates the other on these problems, but the cultural algorithm outperforms the genetic algorithm, and on highly non-Markovian instances also outperforms SARSA.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arnold, D.V., Beyer, H.-G.: Local Performance of the (1+1)-ES in a Noisy Environment. IEEE Trans. Evolutionary Computation 6(1), 30–41 (2002)
Article Google Scholar
Becerra, R.L., Coello, C.A.C.: A Cultural Algorithm with Differential Evolution to Solve Constrained Optimization Problems. In: Lemaître, C., Reyes, C.A., González, J.A. (eds.) IBERAMIA 2004. LNCS (LNAI), vol. 3315, pp. 881–890. Springer, Heidelberg (2004)
Chapter Google Scholar
de Croon, G., van Dartel, M.F., Postma, E.O.: Evolutionary Learning Outperforms Reinforcement Learning on Non-Markovian Tasks. In: Workshop on Memory and Learning Mechanisms in Autonomous Robots, 8th European Conference on Artificial Life, Canterbury, Kent, UK (2005)
Google Scholar
Fitzpatrick, J.M., Grefenstette, J.J.: Genetic Algorithms in Noisy Environments. Machine Learning 3, 101–120 (1988)
Google Scholar
Gao, F., Cui, G., Liu, H.: Integration of Genetic Algorithm and Cultural Algorithms for Constrained Optimization. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4234, pp. 817–825. Springer, Heidelberg (2006)
Chapter Google Scholar
Gopalakrishnan, G., Minsker, B.S., Goldberg, D.: Optimal Sampling in a Noisy Genetic Algorithm for Risk-Based Remediation Design. In: World Water and Environmental Resources Congress. ASCE (2001)
Google Scholar
Heisig, G.: Comparison of (s,S) and (s,nQ) Inventory Control Rules with Respect to Planning Stability. International Journal of Production Economics 73, 59–82 (2001)
Article Google Scholar
Holland, J.H.: Adaptation. In: Progress in Theoretical Biology IV, pp. 263–293. Academic Press, London (1976)
Chapter Google Scholar
Iglesias, R., Rodriguez, M., Sánchez, M., Pereira, E., Regueiro, C.V.: Improving Reinforcement Learning Through a Better Exploration Strategy and an Adjustable Representation of the Environment. In: 3rd European Conference on Mobile Robots (2007)
Google Scholar
Jaakkola, T., Singh, S.P., Jordan, M.I.: Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems. In: Advances in Neural Information Processing Systems 6. MIT Press, Cambridge (1994)
Google Scholar
Kovacs, T., Reynolds, S.I.: A Proposal for Population-Based Reinforcement Learning. Technical report CSTR-03-001, Department of Computer Science, University of Bristol (2003)
Google Scholar
Littman, M.: Memoryless Policies: Theoretical Limitations and Practical Results. In: 3rd Conference on Simulation of Adaptive Behavior (1994)
Google Scholar
Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning Policies for Partially Observable Environments: Scaling Up. In: International Conference on Machine Learning (1995)
Google Scholar
Littman, M., Dean, T., Kaelbling, L.: On the Complexity of Solving Markov Decision Problems. In: 11th Conference on Uncertainty in Artificial Intelligence, pp. 394–402 (1995)
Google Scholar
Liu, H., Hong, B., Shi, D., Ng, G.S.: On Partially Observable Markov Decision Processes Using Genetic Algorithm Based Q-Learning. In: Advances in Neural Networks, pp. 248–252. Watam Press (2007)
Google Scholar
Loch, J., Singh, S.P.: Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes. In: 15th International Conference on Machine Learning, pp. 323–331 (1998)
Google Scholar
Miller, B.L.: Noise, Sampling, and Efficient Genetic Algorithms. PhD thesis, University of Illinois, Urbana-Champaign (1997)
Google Scholar
Miller, B.L., Goldberg, D.E.: Optimal Sampling for Genetic Algorithms. In: Intelligent Engineering Systems Through Artificial Neural Networks, vol. 6, pp. 291–298. ASME Press (1996)
Google Scholar
Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary Algorithms for Reinforcement Learning. Journal of Artificial Intelligence Research 11, 241–276 (1999)
MATH Google Scholar
Penrith, M.D., McGarity, M.J.: An Analysis of Direct Reinforcement Learning in non-Markovian Domains. In: 15th International Conference on Machine Learning, pp. 421–429. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Reynolds, R.G.: An Introduction to Cultural Algorithms. In: 3rd Annual Conference on Evolutionary Programming, pp. 131–139. World Scientific Publishing, Singapore (1994)
Google Scholar
Reynolds, R.G.: Cultural Algorithms: Theory and Applications. New Ideas in Optimization, pp. 367–377. McGraw-Hill, New York (1999)
Google Scholar
Reynolds, R.G., Chung, C.: A Cultural Algorithm Framework to Evolve Multiagent Cooperation With Evolutionary Programming. In: Angeline, P.J., McDonnell, J.R., Reynolds, R.G., Eberhart, R. (eds.) EP 1997. LNCS, vol. 1213, pp. 323–333. Springer, Heidelberg (2006)
Chapter Google Scholar
Rivera, D.C., Becerra, R.L., Coello, C.A.C.: Cultural Algorithms, an Alternative Heuristic to Solve the Job Shop Scheduling Problem. Engineering Optimization 39(1), 69–85 (2007)
Article MathSciNet Google Scholar
Rummery, G.A., Niranjan, M.: On-line Q-learning Using Connectionist Systems. Technical report CUED/F-INFENG/TR 166, Cambridge University (1994)
Google Scholar
Russell, S.J., Zimdars, A.: Q-Decomposition for Reinforcement Learning Agents. In: 20th International Conference on Machine Learning, pp. 656–663. AAAI Press, Menlo Park (2003)
Google Scholar
Silver, E.A., Pyke, D.F., Peterson, R.: Inventory Management and Production Planning and Scheduling. John-Wiley and Sons, New York (1998)
Google Scholar
Singh, S., Jaakkola, T., Jordan, M.: Learning Without State-Estimation in Partially Observable Markovian Decision Processes. In: 11th International Conference on Machine Learning, pp. 284–292. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Stroud, P.D.: Kalman-Extended Genetic Algorithm for Search in Nonstationary Environments with Noisy Fitness Functions. IEEE Transactions on Evolutionary Computation 5(1), 66–77 (2001)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Treharne, J.T., Sox, C.R.: Adaptive Inventory Control for Nonstationary Demand and Partial Information. Management Science 48(5), 607–624 (2002)
Article MATH Google Scholar
Watkins, C.J.C.H.: Learning From Delayed Rewards. PhD thesis, Cambridge University (1989)
Google Scholar
Whitley, D., Kauth, J.: GENITOR: A Different Genetic Algorithm. In: Rocky Mountain Conference on Artificial Intelligence, Denver, CO, USA, pp. 118–130 (1988)
Google Scholar

Download references

Author information

Authors and Affiliations

Cork Constraint Computation Centre, Ireland
S. D. Prestwich & R. Rossi
Department of Management, Hacettepe University, Turkey
S. A. Tarim
Faculty of Computer Science, Izmir University of Economics, Turkey
B. Hnich

Authors

S. D. Prestwich
View author publications
You can also search for this author in PubMed Google Scholar
S. A. Tarim
View author publications
You can also search for this author in PubMed Google Scholar
R. Rossi
View author publications
You can also search for this author in PubMed Google Scholar
B. Hnich
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universitat Politècnica de Catalunya, Barcelona, Spain
María J. Blesa & Christian Blum &
Dept. LCC, Universidad de Málaga, Spain
Carlos Cotta
ETSI Informática, Universidad de Málaga, Málaga, Spain
Antonio J. Fernández
Dept. Lenguajes y Ciencias de la Computación, ETSI Informática, Universidad de Málaga, Málaga, Spain
José E. Gallardo
DEIS, Università di Bologna, Cesena, Italy
Andrea Roli
Université Libre de Bruxelles, IRIDIA CP 194/6, Avenue Franklin D. Roosevelt 50, 1050, Bruxelles, Belgium
Michael Sampels

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Prestwich, S.D., Tarim, S.A., Rossi, R., Hnich, B. (2008). A Cultural Algorithm for POMDPs from Stochastic Inventory Control. In: Blesa, M.J., et al. Hybrid Metaheuristics. HM 2008. Lecture Notes in Computer Science, vol 5296. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88439-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-88439-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88438-5
Online ISBN: 978-3-540-88439-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics