Reinforcement Learning for Decision Making in Sequential Visual Attention

  • Lucas Paletta
  • Gerald Fritz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4840)


The innovation of this work is the provision of a system that learns visual encodings of attention patterns and that enables sequential attention for object detection in real world environments. The system embeds the saccadic decision procedure in a cascaded process where visual evidence is probed at the most informative image locations. It is based on the extraction of information theoretic saliency by determining informative local image descriptors that provide selected foci of interest. Both the local information in terms of code book vector responses, and the geometric information in the shift of attention contribute to the recognition state of a Markov decision process. A Q-learner performs then explorative search on useful actions towards salient locations, developing a strategy of useful action sequences being directed in state space towards the optimization of information maximization. The method is evaluated in experiments on real world object recognition and demonstrates efficient performance in outdoor tasks.


Object Recognition Reinforcement Learn Markov Decision Process Local Descriptor Recognition State 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bandera, C., Vico, F.J., Bravo, J.M., Harmon, M.E., Baird, L.C.: Residual Q-learning applied to visual attention. In: International Conference on Machine Learning, pp. 20–27 (1996)Google Scholar
  2. 2.
    Deco, G.: The computational neuroscience of visual cognition: Attention, memory and reward. In: Proc. International Workshop on Attention and Performance in Computational Vision, pp. 49–58 (2004)Google Scholar
  3. 3.
    Deubel, H.: Localization of targets across saccades: Role of landmark objects. Visual Cognition (11), 173–202 (2004)Google Scholar
  4. 4.
    Fritz, G., Paletta, L., Bischof, H.: Object recognition using local information content. In: ICPR 2004. Proc. International Conference on Pattern Recognition, Cambridge, UK, vol. II, pp. 15–18 (2004)Google Scholar
  5. 5.
    Fritz, G., Seifert, C., Paletta, L., Bischof, H.: Rapid object recognition from discriminative regions of interest. In: AAAI 2004. Proc. National Conference on Artificial Intelligence, San Jose, CA, pp. 444–449 (2004)Google Scholar
  6. 6.
    Fritz, G., Seifert, C., Paletta, L., Bischof, H.: Building recognition using informative local descriptors from mobile imagery. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, Springer, Heidelberg (in print)Google Scholar
  7. 7.
    Gorea, A., Sagi, D.: Selective attention as the substrate of optimal decision behaviour in environments with multiple stimuli. In: Proc. European Conference on Visual Perception (2003)Google Scholar
  8. 8.
    Henderson, J.M.: Human gaze control in real-world scene perception. Trends in Cognitive Sciences 7, 498–504 (2003)CrossRefGoogle Scholar
  9. 9.
    Itti, L., Koch, C.: Computational modeling of visual attention. Nature Reviews Neuroscience 2(3), 194–203 (2001)CrossRefGoogle Scholar
  10. 10.
    Li, M., Clark, J.J.: Learning of position and attention-shift invariant recognition across attention shifts. In: Proc. International Workshop on Attention and Performance in Computational Vision, pp. 41–48 (2004)Google Scholar
  11. 11.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  12. 12.
    Minut, S., Mahadevan, S.: A reinforcement learning model of selective visual attention. In: Proc. International Conference on Autonomous Agents, pp. 457–464 (2001)Google Scholar
  13. 13.
    Puterman, M.L.: Markov Decision Processes. John Wiley & Sons, New York, NY (1994)CrossRefzbMATHGoogle Scholar
  14. 14.
    Rensink, R.A., O’Regan, J.K., Clark, J.J.: To see or not to see: The need for attention to perceive changes in scenes. Psychological Science 8, 368–373 (1997)CrossRefGoogle Scholar
  15. 15.
    Rybak, I.A., Gusakova, I.V., Golovan, A.V., Podladchikova, L.N., Shevtsova, N.A.: A model of attention-guided visual perception and recognition. Vision Research 38, 2387–2400 (1998)CrossRefGoogle Scholar
  16. 16.
    Schall, J.D., Thompson, K.G.: Neural selection and control of visually guided eye movements. Annual Review of Neuroscience 22(22), 241–259 (1999)CrossRefGoogle Scholar
  17. 17.
    Stark, L.W., Choi, Y.S.: Experimental metaphysics: The scanpath as an epistemological mechanism. In: Zangemeister, W.H., Stiehl, H.S., Freska, C. (eds.) Visual attention and cognition, pp. 3–69. Elsevier Science, Amsterdam, Netherlands (1996)CrossRefGoogle Scholar
  18. 18.
    Tipper, S.P., Grisson, S., Kessler, K.: Long-term inhibition of return of attention. Psychological Science 14, 19–25–105 (2003)Google Scholar
  19. 19.
    Watkins, C., Dayan, P.: Q-learning. Machine Learning 8(3,4), 279–292 (1992)zbMATHGoogle Scholar
  20. 20.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 18–32. Springer, Heidelberg (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Lucas Paletta
    • 1
  • Gerald Fritz
    • 1
  1. 1.JOANNEUM RESEARCH Forschungsgesellschaft mbH, Institute of Digital Image Processing, Computational Perception Group, Wastiangasse 6, 8010 GrazAustria

Personalised recommendations