Choosing Search Heuristics by Non-Stationary Reinforcement Learning

  • Alexander Nareyek
Part of the Applied Optimization book series (APOP, volume 86)


Search decisions are often made using heuristic methods because real-world applications can rarely be tackled without any heuristics. In many cases, multiple heuristics can potentially be chosen, and it is not clear a priori which would perform best. In this article, we propose a procedure that learns, during the search process, how to select promising heuristics. The learning is based on weight adaptation and can even switch between different heuristics during search. Different variants of the approach are evaluated within a constraint-programming environment.


Non-stationary reinforcement learning Optimization Local search Constraint programming. 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. J. A. Boyan and A. W. Moore. Using prediction to improve combinatorial optimization search. In Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics (AISTATS-97), 1997.Google Scholar
  2. P. Cowling, G. Kendall, and E. Soubeiga. A parameter-free hyperheuristic for scheduling a sales summit. In Proceedings of the Fourth Metaheuristics International Conference (MIC’2001), pages 127–131, 2001.Google Scholar
  3. M. Dorigo, G. Di Caro, and L. M. Gambardella. Ant algorithms for discrete optimization. Artificial Life, 5 (3): 137–172, 1999.CrossRefGoogle Scholar
  4. J. Frank. Learning short-term weights for GSAT. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97) pages 384391,1997.Google Scholar
  5. H. H. Hoos and T. Stützle. Evaluating las vegas algorithms — pitfalls and remedies. In Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 238–245, 1998.Google Scholar
  6. D. E. Joslin and D. P. Clements. Squeaky wheel optimization. Journal of Artificial Intelligence Research, 10: 353–373, 1999.MathSciNetzbMATHGoogle Scholar
  7. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220 (4598): 671–680, 1983.MathSciNetzbMATHCrossRefGoogle Scholar
  8. M. L. Littman and D. H. Ackley. Adaptation in constant utility non-stationary environments. In Proceedings of the Fourth International Conference on Genetic Algorithms, pages 136–142, 1991.Google Scholar
  9. F. Michaud and M. J. Matark. Learning from history for behavior-based mobile robots in non-stationary environments. Machine Learning (Joint Special Issue on Learning in Autonomous Robots), 31: 141–167, 1998.zbMATHCrossRefGoogle Scholar
  10. A. Nareyek. Constraint-Based Agents–An Architecture for Constraint-Based Modeling and Local-Search-Based Reasoning for Planning and Scheduling in Open and Dynamic Worlds, volume 2062 of Lecture Notes in Artificial Intelligence. Springer, 2001a.Google Scholar
  11. A. Nareyek. Using global constraints for local search. In E. C. Freuder and R. J. Wallace, editors, Constraint Programming and Large Scale Discrete Optimization, volume 57 of DIMACS Series on Discrete Mathematics and Theoretical Computer Science, pages 9–28. American Mathematical Society Publications, 2001b.Google Scholar
  12. G. Rabideau, R. Knight, S. Chien, A. Fukunaga, and A. Govindjee. Iterative repair planning for spacecraft operations in the aspen system. In Proceedings of the International Symposium on Artificial Intelligence Robotics and Automation in Space (iSAIRAS 99 ), 1999.Google Scholar
  13. W. Ruml. Incomplete tree search using adaptive probing. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), pages 235–241, 2001.Google Scholar
  14. J. Schmidhuber. Making the world differentiable: On using self-supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report TR FKI-12690, Department of Computer Science, Technical University of Munich, Germany, 1990.Google Scholar
  15. D. Schuurmans and F. Southey. Local search characteristics of incomplete SAT procedures. In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-2000), pages 297–302, 2000.Google Scholar
  16. S. F. Smith. OPIS: A methodology and architecture for reactive scheduling. In M. Zweben and M. S. Fox, editors, Intelligent Scheduling, pages 29–66. Morgan Kaufmann, 1994.Google Scholar
  17. R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.Google Scholar
  18. C. Voudouris and E. Tsang. Guided local search. Technical Report CSM-247, University of Essex, Department of Computer Science, Colchester, United Kingdom, 1995.Google Scholar
  19. M. Zweben, B. Daun, E. Davis, and M. Deale. Scheduling and rescheduling with iterative repair. In M. Zweben and M. S. Fox, editors, Intelligent Scheduling, pages 241–255. Morgan Kaufmann, 1994.Google Scholar

Copyright information

© Springer Science+Business Media New York 2003

Authors and Affiliations

  • Alexander Nareyek
    • 1
  1. 1.Computer Science DepartmentCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations