Bottom-Up Fixation Prediction Using Unsupervised Hierarchical Models

  • Hamed R. TavakoliEmail author
  • Jorma Laaksonen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10116)


Fixation prediction, also known as saliency modelling, has been a subject undergoing intense study in various contexts. In the context of assistive vision technologies, saliency modelling can be used for development of simulated prosthetic vision as part of the saliency-based cueing algorithms. In this paper, we present an unsupervised multi-scale hierarchical saliency model, which utilizes both local and global saliency pipelines. Motivated by bio-inspired vision findings, we employ features from image statistics. Contrary to previous research, which utilizes one-layer equivalent networks such as independent component analysis (ICA) or principle component analysis (PCA), we adopt independent subspace analysis (ISA), which is equivalent to a two-layer neural architecture. The advantage of ISA over ICA and PCA is robustness towards translation meanwhile being selective to frequency and rotation. We extended the ISA networks by stacking them together, as done in deep models, in order to obtain a hierarchical representation. Making a long story short, (1) we define a framework for unsupervised fixation prediction, exploiting local and global saliency concept which easily generalizes to a hierarchy of any depth. (2) we assess the usefulness of the hierarchical unsupervised features, (3) we adapt the framework for exploiting the features provided by pre-trained deep neural networks, (4) we compare the performance of different features and existing fixation prediction models on MIT1003, (5) we provide the benchmark results of our model on MIT300.


Independent Component Analysis Independent Component Analysis Image Patch Convolutional Neural Network Deep Neural Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The authors would like to acknowledge the Finnish Center of Excellence in Computational Inference Research (COIN) and the computational resources provided by the Aalto Science-IT project.


  1. 1.
    Parikh, N., Itti, L., Weiland, J.: Saliency-based image processing for retinal prostheses. J. Neural Eng. 7, 016006 (2010)CrossRefGoogle Scholar
  2. 2.
    Huang, H.C., Hsieh, C.T., Yeh, C.H.: An indoor obstacle detection system using depth information and region growth. Sensors 15, 27116–27141 (2015)CrossRefGoogle Scholar
  3. 3.
    Li, Y., Hou, X., Koch, C., Rehg, J.M., Yuille, A.L.: The secrets of salient object segmentation. In: CVPR (2014)Google Scholar
  4. 4.
    Itti, L., Koch, C.: A saliency-based search mechanism for overt and covert shifts of visual attention. Vis. Res. 40, 1489–1506 (2000)CrossRefGoogle Scholar
  5. 5.
    Borji, A., Sihite, D., Itti, L.: Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. TIP 22, 55–69 (2013)MathSciNetGoogle Scholar
  6. 6.
    Borji, A., Itti, L.: State-of-the-art in visual attention modeling. PAMI 35, 185–207 (2013)Google Scholar
  7. 7.
    Bruce, N.D.B., Tsotsos, J.K.: Saliency based on information maximization. In: NIPS (2006)Google Scholar
  8. 8.
    Hou, X., Zhang, L.: Dynamic visual attention: searching for coding length increments. In: NIPS (2008)Google Scholar
  9. 9.
    Guo, C., Ma, Q., Zhang, L.: Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In: CVPR (2008)Google Scholar
  10. 10.
    Hou, X., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. PAMI 34, 194–201 (2012)CrossRefGoogle Scholar
  11. 11.
    Vig, E., Dorr, M., Martinetz, T., Barth, E.: Intrinsic dimensionality predicts the saliency of natural dynamic scenes. PAMI 34, 1080–1091 (2012)CrossRefGoogle Scholar
  12. 12.
    Zhang, L., Tong, M.H., Marks, T.K., Shan, H., Cottrell, G.W.: Sun: a Bayesian framework for saliency using natural statistics. J. Vis. 8, 32 (2008)CrossRefGoogle Scholar
  13. 13.
    Rezazadegan Tavakoli, H., Rahtu, E., Heikkilä, J.: Fast and efficient saliency detection using sparse sampling and kernel density estimation. In: Heyden, A., Kahl, F. (eds.) SCIA 2011. LNCS, vol. 6688, pp. 666–675. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-21227-7_62 CrossRefGoogle Scholar
  14. 14.
    Zhao, Q., Koch, C.: Learning saliency-based visual attention: a review. Signal Process. 93, 1401–1407 (2013)CrossRefGoogle Scholar
  15. 15.
    Pan, J., McGuinness, K., Sayrol, E., O’Connor, N., Giro-i Nieto, X.: Shallow and deep convolutional networks for saliency prediction. In: CVPR (2016)Google Scholar
  16. 16.
    Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: CVPR (2014)Google Scholar
  17. 17.
    Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV (2009)Google Scholar
  18. 18.
    Bylinskii, Z., Judd, T., Borji, A., Itti, L., Durand, F., Oliva, A., Torralba, A.: Mit saliency benchmark (2016)Google Scholar
  19. 19.
    Kmmerer, M., Theis, L., Bethge, M.: Deep gaze i: Boosting saliency prediction with feature maps trained on imagenet. In: ICLR Workshop (2015)Google Scholar
  20. 20.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  21. 21.
    Liu, N., Han, J., Zhang, D., Wen, S., Liu, T.: Predicting eye fixations using convolutional neural networks. In: CVPR (2015)Google Scholar
  22. 22.
    Kruthiventi, S.S., Ayush, K., Babu, R.V.: Deepfix: A fully convolutional neural network for predicting human eye fixations (2015)Google Scholar
  23. 23.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  24. 24.
    Huang, X., Shen, C., Boix, X., Zhao, Q.: Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: ICCV (2015)Google Scholar
  25. 25.
    Bruce, N.D.B., Tsotsos, J.K.: Saliency, attention, and visual search: an information theoretic approach. J. Vis. 9, 5 (2009)CrossRefGoogle Scholar
  26. 26.
    Borji, A., Itti, L.: Exploiting local and global patch rarities for saliency detection. In: CVPR (2012)Google Scholar
  27. 27.
    Mancas, M.: Computational attention: towards attentive computers. PhD thesis, CIACO University (2007)Google Scholar
  28. 28.
    Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: NIPS (2007)Google Scholar
  29. 29.
    Gopalakrishnan, V., Hu, Y., Rajan, D.: Random walks on graphs to model saliency in images. In: CVPR (2009)Google Scholar
  30. 30.
    Hyvarinen, A., Hoyer, P.: Emergence of phase- and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Comput. 12, 1705–1720 (2000)CrossRefGoogle Scholar
  31. 31.
    Comon, P.: Independent component analysis - a new concept? Signal Process. 36, 287–314 (1994)CrossRefzbMATHGoogle Scholar
  32. 32.
    Hyvärinen, A., Hurri, J., Hoyer, P.O.: Natural Image Statistics - A Probabilistic Approach to Early Computational Vision. Springer, London (2009)zbMATHGoogle Scholar
  33. 33.
    Matsuda, Y., Yamaguchi, K.: Linear multilayer independent component analysis for large natural scenes. In: NIPS (2004)Google Scholar
  34. 34.
    Matsuda, Y., Yamaguchi, K.: Linear multilayer ICA generating hierarchical edge detectors. Neural Comput. 19, 218–230 (2007)CrossRefzbMATHGoogle Scholar
  35. 35.
    Matsuda, Y., Yamaguchi, K.: Linear multilayer ICA using adaptive PCA. Neural Process. Lett. 30, 133–144 (2009)CrossRefGoogle Scholar
  36. 36.
    Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR (2011)Google Scholar
  37. 37.
    Gutmann, M.U., Hyvärinen, A.: A three-layer model of natural image statistics. J. Physiol. Paris 107, 369–398 (2013)CrossRefGoogle Scholar
  38. 38.
    Hosoya, H., Hyvarinen, A.: A hierarchical statistical model of natural images explains tuning properties in V2. J. NEUROSCI. 35(29), 10412–10428 (2015)CrossRefGoogle Scholar
  39. 39.
    Bengio, Y., LeCun, Y.: Scaling learning algorithms towards AI. Large-Scale Kernel Mach. 34(5), 1–41 (2007)Google Scholar
  40. 40.
    Olmos, A., Kingdom, F.A.: A biologically inspired algorithm for the recovery of shading and reflectance images. Perception 33, 1463–1473 (2004)CrossRefGoogle Scholar
  41. 41.
    Borji, A., R.-Tavakoli, H., Sihite, D.N., Itti, L.: Analysis of scores, datasets, and models in visual saliency prediction. In: ICCV (2013)Google Scholar
  42. 42.
    Riche, N., Duvinage, M., Mancas, M., Gosselin, B., Dutoit, T.: Saliency and human fixations: state-of-the-art and study of comparison metrics. In: ICCV (2013)Google Scholar
  43. 43.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). doi: 10.1007/978-3-319-10590-1_53 Google Scholar
  44. 44.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. PAMI 20, 1254–1259 (1998)CrossRefGoogle Scholar
  45. 45.
    Huang, L., Pashler, H.: A boolean map theory of visual attention. Psychol. Rev. 114, 599 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceAalto UniversityEspooFinland

Personalised recommendations