Bottom-Up Processing in Complex Scenes: A Unifying Perspective on Segmentation, Fixation Saliency, Candidate Regions, Base-Detail Decomposition, and Image Enhancement

  • Boyan BonevEmail author
  • Alan L. Yuille
Part of the Trends in Augmentation of Human Performance book series (TAHP, volume 5)


Early visual processing should offer efficient bottom-up mechanisms aiming to simplify visual information, enhance it, and direct attention to make high-level processing more efficient. Based on these considerations, we propose a unified approach which addresses a set of fundamental early visual processes: segmentation, candidate regions, base-detail decomposition, image enhancement, and saliency for fixations prediction. We argue that for complex scenes all these processes require hierarchical segmentwise processing. Furthermore, we argue that some of these visual tasks require the ability to decompose the appearance of the segments into “base” appearance and “detail” appearance. An important, and surprising, result of this decomposition is a novel method for successfully predicting human eye fixations. Our hypothesis is that we fixate on segments that are not easy to model, e.g., are small but have a lot of detail, in order to obtain a higher resolution representation for further analysis. We show performances on psychophysics data on the Pascal VOC dataset, whose images are non-iconic and particularly difficult for the state-of-the-art saliency algorithms.


Bottom-up visual processing Image segmentation Base-detail decomposition Saliency 



We would like to thank Laurent Itti, Li Zhaoping, John Flynn, and the reviewers for their valuable comments. This work is partially supported by NSF award CCF-1317376, by ONR N00014-12-1-0883 and by NVidia Corp.


  1. 1.
    Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2012) SLIC superpixels compared to state-of-the-art superpixel methods. TPAMI 34(11):2274–2282CrossRefGoogle Scholar
  2. 2.
    Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. TPAMI 34(11):2189–2202CrossRefGoogle Scholar
  3. 3.
    Alpert S, Galun M, Brandt A, Basri R (2012) Image segmentation by probabilistic bottom-up aggregation and cue integration. TPAMI 34(2):315–327CrossRefGoogle Scholar
  4. 4.
    Arbelaez P (2006) Boundary extraction in natural images using ultrametric contour maps. In: Proceedings of the 2006 conference on computer vision and pattern recognition workshop, CVPRW ’06. IEEE Computer Society, Washington, DC, pp 182–Google Scholar
  5. 5.
    Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. TPAMI 33(5):898–916CrossRefGoogle Scholar
  6. 6.
    Arbelaez P, Hariharan B, Gu C, Gupta S, Malik J (2012) Semantic segmentation using regions and parts. In: CVPR, ProvidenceCrossRefGoogle Scholar
  7. 7.
    Bae S, Paris S, Durand F (2006) Two-scale tone management for photographic look. ACM Trans Graph 25(3):637–645CrossRefGoogle Scholar
  8. 8.
    Barron JT, Malik J (2012) Color constancy, intrinsic images, and shape estimation. In: ECCV, FlorenceCrossRefGoogle Scholar
  9. 9.
    Barrow HG, Tenenbaum JM (1978) Recovering intrinsic scene characteristics from images. Technical report 157, AI Center, SRI International, 333 Ravenswood Ave., Menlo Park, CA 94025Google Scholar
  10. 10.
    Bonev B, Yuille AL (2014) A fast and simple algorithm for producing candidate regions. In: European conference on computer vision (ECCV 2014), ZurichGoogle Scholar
  11. 11.
    Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207CrossRefPubMedGoogle Scholar
  12. 12.
    Borji A, Sihite DN, Itti L (2013) Objects do not predict fixations better than early saliency: a re-analysis of Einhäuser et al.’s data. J Vis 13(10):18Google Scholar
  13. 13.
    Borji A, Cheng M, Jiang H, Li J (2014) Salient object detection: a survey. CoRR, abs/1411.5878Google Scholar
  14. 14.
    Bradley C, Abrams J, Geisler WS (2014) Retina-v1 model of detectability across the visual field. J Vis 14(12):22PubMedCentralCrossRefPubMedGoogle Scholar
  15. 15.
    Carreira J, Sminchisescu C (2012) CPMC: automatic object segmentation using constrained parametric min-cuts. TPAMI 34(7):1312–1328CrossRefGoogle Scholar
  16. 16.
    Einhäuser W, Spain M, Perona P (2008) Objects predict fixations better than early saliency. J Vis 8(14):18CrossRefPubMedGoogle Scholar
  17. 17.
    Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The PASCAL visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338CrossRefGoogle Scholar
  18. 18.
    Farbman Z, Fattal R, Lischinski D, Szeliski R (2008) Edge-preserving decompositions for multi-scale tone and detail manipulation. ACM Trans Graph 27(3):67:1–67:10Google Scholar
  19. 19.
    Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. IJCV 59(2):167–181CrossRefGoogle Scholar
  20. 20.
    Galun M, Sharon E, Basri R, Brandt A (2003) Texture segmentation by multiscale aggregation of filter responses and shape elements. In: ICCV ’03, Nice, pp 716–Google Scholar
  21. 21.
    Garcia-Diaz A, Leborán V, Fdez-Vidal XR, Pardo XM (2012) On the relationship between optical variability, visual saliency, and eye fixations: a computational approach. J Vis 12(6):1–22CrossRefGoogle Scholar
  22. 22.
    Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741CrossRefPubMedGoogle Scholar
  23. 23.
    Gollisch T, Meister M (2010) Eye smarter than scientists believed: neural computations in circuits of the retina. Neuron 65(2):150–164PubMedCentralCrossRefPubMedGoogle Scholar
  24. 24.
    Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU Press, BaltimoreGoogle Scholar
  25. 25.
    Gonzalez RC, Woods RE, Eddins SL (2004) Digital image processing using matlab. Pearson Prentice Hall, Upper Saddle RiverGoogle Scholar
  26. 26.
    Gorelick L, Basri R (2009) Shape based detection and top-down delineation using image segments. Int J Comput Vis 83(3):211–232CrossRefGoogle Scholar
  27. 27.
    Horn BKP, Brooks MJ (1986) The variational approach to shape from shading. Comput Vis Graph Image Process 33(2):174–208CrossRefGoogle Scholar
  28. 28.
    Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE TPAMI 34(1):194–201CrossRefGoogle Scholar
  29. 29.
    Humayun A, Li F, Rehg JM (2014) RIGOR: reusing inference in graph cuts for generating object regions. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), Columbus. IEEE, New YorkGoogle Scholar
  30. 30.
    Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI 20(11):1254–1259CrossRefGoogle Scholar
  31. 31.
    Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: ICCV, Kyoto, pp 2106–2113. IEEE, New YorkGoogle Scholar
  32. 32.
    Land EH (1977) The retinex theory of color vision. Sci Am 237(6):108–28CrossRefPubMedGoogle Scholar
  33. 33.
    Leclerc YG (1989) Image and boundary segmentation via minimal-length encoding on the connection machine. In: Proceedings of a workshop on image understanding workshop, Palo Alto. Morgan Kaufmann, San Francisco, pp 1056–1069. ISBN 1-55860-070-1.
  34. 34.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324CrossRefGoogle Scholar
  35. 35.
    Leonenko N, Pronzato L, Savani V (2008) A class of Rényi information estimators for multidimensional densities. Ann Statist 36(5):2153–2182CrossRefGoogle Scholar
  36. 36.
    Li J, Levine M, An X, He H (2011) Saliency detection based on frequency and spatial domain analyses. In: Proceedings of BMVC, Dundee, pp 86.1–86.11. C.25.86
  37. 37.
    Li J, Levine MD, An X, Xu X, He H (2013) Visual saliency based on scale-space analysis in the frequency domain. IEEE Trans Pattern Anal Mach Intell 35(4):996–1010CrossRefPubMedGoogle Scholar
  38. 38.
    Li Y, Hou X, Koch C, Rehg JM, Yuille AL (2014) The secrets of salient object segmentation. In: CVPR, ColumbusCrossRefGoogle Scholar
  39. 39.
    Marr D (1982) Vision: a computational investigation into the human representation and processing of visual information. Henry Holt and Co., New YorkGoogle Scholar
  40. 40.
    Mottaghi R, Chen X, Liu X, Fidler S, Urtasun R, Yuille A (2014) The role of context for object detection and semantic segmentation in the wild. In: CVPR, ColumbusCrossRefGoogle Scholar
  41. 41.
    Russ JC, Woods RP (1995) The image processing handbook. J Comput Assist Tomogr 19(6):979–981CrossRefGoogle Scholar
  42. 42.
    Shapley R, Enroth-Cugell C (1984) Visual adaptation and retinal gain controls. Prog Retin Res 3:263–346CrossRefGoogle Scholar
  43. 43.
    Todorovic S, Ahuja N (2008) Region-based hierarchical image matching. IJCV 78(1):47–66CrossRefGoogle Scholar
  44. 44.
    Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. In: Sixth international conference on computer vision, 1998. IEEE, Washington, DC, pp 839–846Google Scholar
  45. 45.
    Tu Z, Zhu S-C, Shum H-Y (2001) Image segmentation by data driven Markov chain Monte Carlo. In: Proceedings of eighth IEEE international conference on computer vision, 2001. ICCV 2001, Vancouver, vol 2, pp 131–138Google Scholar
  46. 46.
    Uijlings JRR, van de Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171CrossRefGoogle Scholar
  47. 47.
    Woodham RJ (1980) Photometric method for determining surface orientation from multiple images. Opt Eng 19(1):191139–191139CrossRefGoogle Scholar
  48. 48.
    Xu C, Xiong C, Corso JJ (2012) Streaming hierarchical video segmentation. In: ECCV, FlorenceCrossRefGoogle Scholar
  49. 49.
    Yuan L, Sun J (2012) Automatic exposure correction of consumer photographs. In: Fitzgibbon AW, Lazebnik S, Perona P, Sato Y, Schmid C (eds) ECCV (4). Volume 7575 of Lecture notes in computer science. Springer, Berlin/New York, pp 771–785Google Scholar
  50. 50.
    Zhaoping L (2003) V1 mechanisms and some figure-ground and border effects. J Physiol 97(1):503–515Google Scholar
  51. 51.
    Zhaoping L (2014) Understanding vision: theory, models, and data. Oxford University Press, OxfordCrossRefGoogle Scholar
  52. 52.
    Zhu SC, Yuille A (1996) Region competition: unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Trans Pattern Anal Mach Intell 18(9):884–900CrossRefGoogle Scholar
  53. 53.
    Zhu L, Chen Y, Lin Y, Lin C, Yuille A (2012) Recursive segmentation and recognition templates for image parsing. IEEE Trans Pattern Anal Mach Intell 34(2):359–371CrossRefPubMedGoogle Scholar
  54. 54.
    Zhu Y, Zhang Y, Yuille A (2014) Single image super-resolution using deformable patches. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), Columbus, pp 2917–2924Google Scholar
  55. 55.
    Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: ECCV, ZurichGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of CaliforniaLos AngelesUSA

Personalised recommendations