Abstract
Reinforcement Learning algorithms such as SARSA with an eligibility trace, and Evolutionary Computation methods such as genetic algorithms, are competing approaches to solving Partially Observable Markov Decision Processes (POMDPs) which occur in many fields of Artificial Intelligence. A powerful form of evolutionary algorithm that has not previously been applied to POMDPs is the cultural algorithm, in which evolving agents share knowledge in a belief space that is used to guide their evolution. We describe a cultural algorithm for POMDPs that hybridises SARSA with a noisy genetic algorithm, and inherits the latter’s convergence properties. Its belief space is a common set of state-action values that are updated during genetic exploration, and conversely used to modify chromosomes. We use it to solve problems from stochastic inventory control by finding memoryless policies for nondeterministic POMDPs. Neither SARSA nor the genetic algorithm dominates the other on these problems, but the cultural algorithm outperforms the genetic algorithm, and on highly non-Markovian instances also outperforms SARSA.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Arnold, D.V., Beyer, H.-G.: Local Performance of the (1+1)-ES in a Noisy Environment. IEEE Trans. Evolutionary Computation 6(1), 30–41 (2002)
Becerra, R.L., Coello, C.A.C.: A Cultural Algorithm with Differential Evolution to Solve Constrained Optimization Problems. In: Lemaître, C., Reyes, C.A., González, J.A. (eds.) IBERAMIA 2004. LNCS (LNAI), vol. 3315, pp. 881–890. Springer, Heidelberg (2004)
de Croon, G., van Dartel, M.F., Postma, E.O.: Evolutionary Learning Outperforms Reinforcement Learning on Non-Markovian Tasks. In: Workshop on Memory and Learning Mechanisms in Autonomous Robots, 8th European Conference on Artificial Life, Canterbury, Kent, UK (2005)
Fitzpatrick, J.M., Grefenstette, J.J.: Genetic Algorithms in Noisy Environments. Machine Learning 3, 101–120 (1988)
Gao, F., Cui, G., Liu, H.: Integration of Genetic Algorithm and Cultural Algorithms for Constrained Optimization. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4234, pp. 817–825. Springer, Heidelberg (2006)
Gopalakrishnan, G., Minsker, B.S., Goldberg, D.: Optimal Sampling in a Noisy Genetic Algorithm for Risk-Based Remediation Design. In: World Water and Environmental Resources Congress. ASCE (2001)
Heisig, G.: Comparison of (s,S) and (s,nQ) Inventory Control Rules with Respect to Planning Stability. International Journal of Production Economics 73, 59–82 (2001)
Holland, J.H.: Adaptation. In: Progress in Theoretical Biology IV, pp. 263–293. Academic Press, London (1976)
Iglesias, R., Rodriguez, M., Sánchez, M., Pereira, E., Regueiro, C.V.: Improving Reinforcement Learning Through a Better Exploration Strategy and an Adjustable Representation of the Environment. In: 3rd European Conference on Mobile Robots (2007)
Jaakkola, T., Singh, S.P., Jordan, M.I.: Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems. In: Advances in Neural Information Processing Systems 6. MIT Press, Cambridge (1994)
Kovacs, T., Reynolds, S.I.: A Proposal for Population-Based Reinforcement Learning. Technical report CSTR-03-001, Department of Computer Science, University of Bristol (2003)
Littman, M.: Memoryless Policies: Theoretical Limitations and Practical Results. In: 3rd Conference on Simulation of Adaptive Behavior (1994)
Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning Policies for Partially Observable Environments: Scaling Up. In: International Conference on Machine Learning (1995)
Littman, M., Dean, T., Kaelbling, L.: On the Complexity of Solving Markov Decision Problems. In: 11th Conference on Uncertainty in Artificial Intelligence, pp. 394–402 (1995)
Liu, H., Hong, B., Shi, D., Ng, G.S.: On Partially Observable Markov Decision Processes Using Genetic Algorithm Based Q-Learning. In: Advances in Neural Networks, pp. 248–252. Watam Press (2007)
Loch, J., Singh, S.P.: Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes. In: 15th International Conference on Machine Learning, pp. 323–331 (1998)
Miller, B.L.: Noise, Sampling, and Efficient Genetic Algorithms. PhD thesis, University of Illinois, Urbana-Champaign (1997)
Miller, B.L., Goldberg, D.E.: Optimal Sampling for Genetic Algorithms. In: Intelligent Engineering Systems Through Artificial Neural Networks, vol. 6, pp. 291–298. ASME Press (1996)
Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary Algorithms for Reinforcement Learning. Journal of Artificial Intelligence Research 11, 241–276 (1999)
Penrith, M.D., McGarity, M.J.: An Analysis of Direct Reinforcement Learning in non-Markovian Domains. In: 15th International Conference on Machine Learning, pp. 421–429. Morgan Kaufmann, San Francisco (1998)
Reynolds, R.G.: An Introduction to Cultural Algorithms. In: 3rd Annual Conference on Evolutionary Programming, pp. 131–139. World Scientific Publishing, Singapore (1994)
Reynolds, R.G.: Cultural Algorithms: Theory and Applications. New Ideas in Optimization, pp. 367–377. McGraw-Hill, New York (1999)
Reynolds, R.G., Chung, C.: A Cultural Algorithm Framework to Evolve Multiagent Cooperation With Evolutionary Programming. In: Angeline, P.J., McDonnell, J.R., Reynolds, R.G., Eberhart, R. (eds.) EP 1997. LNCS, vol. 1213, pp. 323–333. Springer, Heidelberg (2006)
Rivera, D.C., Becerra, R.L., Coello, C.A.C.: Cultural Algorithms, an Alternative Heuristic to Solve the Job Shop Scheduling Problem. Engineering Optimization 39(1), 69–85 (2007)
Rummery, G.A., Niranjan, M.: On-line Q-learning Using Connectionist Systems. Technical report CUED/F-INFENG/TR 166, Cambridge University (1994)
Russell, S.J., Zimdars, A.: Q-Decomposition for Reinforcement Learning Agents. In: 20th International Conference on Machine Learning, pp. 656–663. AAAI Press, Menlo Park (2003)
Silver, E.A., Pyke, D.F., Peterson, R.: Inventory Management and Production Planning and Scheduling. John-Wiley and Sons, New York (1998)
Singh, S., Jaakkola, T., Jordan, M.: Learning Without State-Estimation in Partially Observable Markovian Decision Processes. In: 11th International Conference on Machine Learning, pp. 284–292. Morgan Kaufmann, San Francisco (1994)
Stroud, P.D.: Kalman-Extended Genetic Algorithm for Search in Nonstationary Environments with Noisy Fitness Functions. IEEE Transactions on Evolutionary Computation 5(1), 66–77 (2001)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Treharne, J.T., Sox, C.R.: Adaptive Inventory Control for Nonstationary Demand and Partial Information. Management Science 48(5), 607–624 (2002)
Watkins, C.J.C.H.: Learning From Delayed Rewards. PhD thesis, Cambridge University (1989)
Whitley, D., Kauth, J.: GENITOR: A Different Genetic Algorithm. In: Rocky Mountain Conference on Artificial Intelligence, Denver, CO, USA, pp. 118–130 (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Prestwich, S.D., Tarim, S.A., Rossi, R., Hnich, B. (2008). A Cultural Algorithm for POMDPs from Stochastic Inventory Control. In: Blesa, M.J., et al. Hybrid Metaheuristics. HM 2008. Lecture Notes in Computer Science, vol 5296. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88439-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-88439-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88438-5
Online ISBN: 978-3-540-88439-2
eBook Packages: Computer ScienceComputer Science (R0)