Choosing Search Heuristics by Non-Stationary Reinforcement Learning
Search decisions are often made using heuristic methods because real-world applications can rarely be tackled without any heuristics. In many cases, multiple heuristics can potentially be chosen, and it is not clear a priori which would perform best. In this article, we propose a procedure that learns, during the search process, how to select promising heuristics. The learning is based on weight adaptation and can even switch between different heuristics during search. Different variants of the approach are evaluated within a constraint-programming environment.
KeywordsNon-stationary reinforcement learning Optimization Local search Constraint programming.
Unable to display preview. Download preview PDF.
- J. A. Boyan and A. W. Moore. Using prediction to improve combinatorial optimization search. In Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics (AISTATS-97), 1997.Google Scholar
- P. Cowling, G. Kendall, and E. Soubeiga. A parameter-free hyperheuristic for scheduling a sales summit. In Proceedings of the Fourth Metaheuristics International Conference (MIC’2001), pages 127–131, 2001.Google Scholar
- J. Frank. Learning short-term weights for GSAT. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97) pages 384391,1997.Google Scholar
- H. H. Hoos and T. Stützle. Evaluating las vegas algorithms — pitfalls and remedies. In Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 238–245, 1998.Google Scholar
- M. L. Littman and D. H. Ackley. Adaptation in constant utility non-stationary environments. In Proceedings of the Fourth International Conference on Genetic Algorithms, pages 136–142, 1991.Google Scholar
- A. Nareyek. Constraint-Based Agents–An Architecture for Constraint-Based Modeling and Local-Search-Based Reasoning for Planning and Scheduling in Open and Dynamic Worlds, volume 2062 of Lecture Notes in Artificial Intelligence. Springer, 2001a.Google Scholar
- A. Nareyek. Using global constraints for local search. In E. C. Freuder and R. J. Wallace, editors, Constraint Programming and Large Scale Discrete Optimization, volume 57 of DIMACS Series on Discrete Mathematics and Theoretical Computer Science, pages 9–28. American Mathematical Society Publications, 2001b.Google Scholar
- G. Rabideau, R. Knight, S. Chien, A. Fukunaga, and A. Govindjee. Iterative repair planning for spacecraft operations in the aspen system. In Proceedings of the International Symposium on Artificial Intelligence Robotics and Automation in Space (iSAIRAS 99 ), 1999.Google Scholar
- W. Ruml. Incomplete tree search using adaptive probing. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), pages 235–241, 2001.Google Scholar
- J. Schmidhuber. Making the world differentiable: On using self-supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report TR FKI-12690, Department of Computer Science, Technical University of Munich, Germany, 1990.Google Scholar
- D. Schuurmans and F. Southey. Local search characteristics of incomplete SAT procedures. In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-2000), pages 297–302, 2000.Google Scholar
- S. F. Smith. OPIS: A methodology and architecture for reactive scheduling. In M. Zweben and M. S. Fox, editors, Intelligent Scheduling, pages 29–66. Morgan Kaufmann, 1994.Google Scholar
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.Google Scholar
- C. Voudouris and E. Tsang. Guided local search. Technical Report CSM-247, University of Essex, Department of Computer Science, Colchester, United Kingdom, 1995.Google Scholar
- M. Zweben, B. Daun, E. Davis, and M. Deale. Scheduling and rescheduling with iterative repair. In M. Zweben and M. S. Fox, editors, Intelligent Scheduling, pages 241–255. Morgan Kaufmann, 1994.Google Scholar