Abstract
Early visual processing should offer efficient bottom-up mechanisms aiming to simplify visual information, enhance it, and direct attention to make high-level processing more efficient. Based on these considerations, we propose a unified approach which addresses a set of fundamental early visual processes: segmentation, candidate regions, base-detail decomposition, image enhancement, and saliency for fixations prediction. We argue that for complex scenes all these processes require hierarchical segmentwise processing. Furthermore, we argue that some of these visual tasks require the ability to decompose the appearance of the segments into “base” appearance and “detail” appearance. An important, and surprising, result of this decomposition is a novel method for successfully predicting human eye fixations. Our hypothesis is that we fixate on segments that are not easy to model, e.g., are small but have a lot of detail, in order to obtain a higher resolution representation for further analysis. We show performances on psychophysics data on the Pascal VOC dataset, whose images are non-iconic and particularly difficult for the state-of-the-art saliency algorithms.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2012) SLIC superpixels compared to state-of-the-art superpixel methods. TPAMI 34(11):2274–2282
Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. TPAMI 34(11):2189–2202
Alpert S, Galun M, Brandt A, Basri R (2012) Image segmentation by probabilistic bottom-up aggregation and cue integration. TPAMI 34(2):315–327
Arbelaez P (2006) Boundary extraction in natural images using ultrametric contour maps. In: Proceedings of the 2006 conference on computer vision and pattern recognition workshop, CVPRW ’06. IEEE Computer Society, Washington, DC, pp 182–
Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. TPAMI 33(5):898–916
Arbelaez P, Hariharan B, Gu C, Gupta S, Malik J (2012) Semantic segmentation using regions and parts. In: CVPR, Providence
Bae S, Paris S, Durand F (2006) Two-scale tone management for photographic look. ACM Trans Graph 25(3):637–645
Barron JT, Malik J (2012) Color constancy, intrinsic images, and shape estimation. In: ECCV, Florence
Barrow HG, Tenenbaum JM (1978) Recovering intrinsic scene characteristics from images. Technical report 157, AI Center, SRI International, 333 Ravenswood Ave., Menlo Park, CA 94025
Bonev B, Yuille AL (2014) A fast and simple algorithm for producing candidate regions. In: European conference on computer vision (ECCV 2014), Zurich
Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207
Borji A, Sihite DN, Itti L (2013) Objects do not predict fixations better than early saliency: a re-analysis of Einhäuser et al.’s data. J Vis 13(10):18
Borji A, Cheng M, Jiang H, Li J (2014) Salient object detection: a survey. CoRR, abs/1411.5878
Bradley C, Abrams J, Geisler WS (2014) Retina-v1 model of detectability across the visual field. J Vis 14(12):22
Carreira J, Sminchisescu C (2012) CPMC: automatic object segmentation using constrained parametric min-cuts. TPAMI 34(7):1312–1328
Einhäuser W, Spain M, Perona P (2008) Objects predict fixations better than early saliency. J Vis 8(14):18
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The PASCAL visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Farbman Z, Fattal R, Lischinski D, Szeliski R (2008) Edge-preserving decompositions for multi-scale tone and detail manipulation. ACM Trans Graph 27(3):67:1–67:10
Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. IJCV 59(2):167–181
Galun M, Sharon E, Basri R, Brandt A (2003) Texture segmentation by multiscale aggregation of filter responses and shape elements. In: ICCV ’03, Nice, pp 716–
Garcia-Diaz A, Leborán V, Fdez-Vidal XR, Pardo XM (2012) On the relationship between optical variability, visual saliency, and eye fixations: a computational approach. J Vis 12(6):1–22
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741
Gollisch T, Meister M (2010) Eye smarter than scientists believed: neural computations in circuits of the retina. Neuron 65(2):150–164
Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU Press, Baltimore
Gonzalez RC, Woods RE, Eddins SL (2004) Digital image processing using matlab. Pearson Prentice Hall, Upper Saddle River
Gorelick L, Basri R (2009) Shape based detection and top-down delineation using image segments. Int J Comput Vis 83(3):211–232
Horn BKP, Brooks MJ (1986) The variational approach to shape from shading. Comput Vis Graph Image Process 33(2):174–208
Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE TPAMI 34(1):194–201
Humayun A, Li F, Rehg JM (2014) RIGOR: reusing inference in graph cuts for generating object regions. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), Columbus. IEEE, New York
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI 20(11):1254–1259
Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: ICCV, Kyoto, pp 2106–2113. IEEE, New York
Land EH (1977) The retinex theory of color vision. Sci Am 237(6):108–28
Leclerc YG (1989) Image and boundary segmentation via minimal-length encoding on the connection machine. In: Proceedings of a workshop on image understanding workshop, Palo Alto. Morgan Kaufmann, San Francisco, pp 1056–1069. ISBN 1-55860-070-1. http://dl.acm.org/citation.cfm?id=94703.99744
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324
Leonenko N, Pronzato L, Savani V (2008) A class of Rényi information estimators for multidimensional densities. Ann Statist 36(5):2153–2182
Li J, Levine M, An X, He H (2011) Saliency detection based on frequency and spatial domain analyses. In: Proceedings of BMVC, Dundee, pp 86.1–86.11. http://dx.doi.org/10.5244/C.25.86 http://dx.doi.org/10.5244/ C.25.86 http://dx.doi.org/10.5244/C.25.86
Li J, Levine MD, An X, Xu X, He H (2013) Visual saliency based on scale-space analysis in the frequency domain. IEEE Trans Pattern Anal Mach Intell 35(4):996–1010
Li Y, Hou X, Koch C, Rehg JM, Yuille AL (2014) The secrets of salient object segmentation. In: CVPR, Columbus
Marr D (1982) Vision: a computational investigation into the human representation and processing of visual information. Henry Holt and Co., New York
Mottaghi R, Chen X, Liu X, Fidler S, Urtasun R, Yuille A (2014) The role of context for object detection and semantic segmentation in the wild. In: CVPR, Columbus
Russ JC, Woods RP (1995) The image processing handbook. J Comput Assist Tomogr 19(6):979–981
Shapley R, Enroth-Cugell C (1984) Visual adaptation and retinal gain controls. Prog Retin Res 3:263–346
Todorovic S, Ahuja N (2008) Region-based hierarchical image matching. IJCV 78(1):47–66
Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. In: Sixth international conference on computer vision, 1998. IEEE, Washington, DC, pp 839–846
Tu Z, Zhu S-C, Shum H-Y (2001) Image segmentation by data driven Markov chain Monte Carlo. In: Proceedings of eighth IEEE international conference on computer vision, 2001. ICCV 2001, Vancouver, vol 2, pp 131–138
Uijlings JRR, van de Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Woodham RJ (1980) Photometric method for determining surface orientation from multiple images. Opt Eng 19(1):191139–191139
Xu C, Xiong C, Corso JJ (2012) Streaming hierarchical video segmentation. In: ECCV, Florence
Yuan L, Sun J (2012) Automatic exposure correction of consumer photographs. In: Fitzgibbon AW, Lazebnik S, Perona P, Sato Y, Schmid C (eds) ECCV (4). Volume 7575 of Lecture notes in computer science. Springer, Berlin/New York, pp 771–785
Zhaoping L (2003) V1 mechanisms and some figure-ground and border effects. J Physiol 97(1):503–515
Zhaoping L (2014) Understanding vision: theory, models, and data. Oxford University Press, Oxford
Zhu SC, Yuille A (1996) Region competition: unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Trans Pattern Anal Mach Intell 18(9):884–900
Zhu L, Chen Y, Lin Y, Lin C, Yuille A (2012) Recursive segmentation and recognition templates for image parsing. IEEE Trans Pattern Anal Mach Intell 34(2):359–371
Zhu Y, Zhang Y, Yuille A (2014) Single image super-resolution using deformable patches. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), Columbus, pp 2917–2924
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: ECCV, Zurich
Acknowledgements
We would like to thank Laurent Itti, Li Zhaoping, John Flynn, and the reviewers for their valuable comments. This work is partially supported by NSF award CCF-1317376, by ONR N00014-12-1-0883 and by NVidia Corp.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Bonev, B., Yuille, A.L. (2015). Bottom-Up Processing in Complex Scenes: A Unifying Perspective on Segmentation, Fixation Saliency, Candidate Regions, Base-Detail Decomposition, and Image Enhancement. In: Lee, SW., Bülthoff, H., Müller, KR. (eds) Recent Progress in Brain and Cognitive Engineering. Trends in Augmentation of Human Performance, vol 5. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-7239-6_8
Download citation
DOI: https://doi.org/10.1007/978-94-017-7239-6_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-7238-9
Online ISBN: 978-94-017-7239-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)