Abstract
Theories and models on saliency that predict where people look at focus on regular-density scenes. A crowded scene is characterized by the co-occurrence of a relatively large number of regions/objects that would have stood out if in a regular scene, and what drives attention in crowd can be significantly different from the conclusions in the regular setting. This work presents a first focused study on saliency in crowd. To facilitate saliency in crowd study, a new dataset of 500 images is constructed with eye tracking data from 16 viewers and annotation data on faces (the dataset will be publicly available with the paper). Statistical analyses point to key observations on features and mechanisms of saliency in scenes with different crowd levels and provide insights as of whether conventional saliency models hold in crowding scenes. Finally a new model for saliency prediction that takes into account the crowding information is proposed, and multiple kernel learning (MKL) is used as a core computational module to integrate various features at both low- and high-levels. Extensive experiments demonstrate the superior performance of the proposed model compared with the state-of-the-art in saliency computation.
Chapter PDF
Similar content being viewed by others
References
Borji, A.: Evaluation measures, https://sites.google.com/site/saliencyevaluation/evaluation-measures
Bruce, N., Tsotsos, J.: Saliency, attention, and visual search: An information theoretic approach. Journal of Vision 9(3), 5 (2009)
Burges, C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)
Cerf, M., Frady, E., Koch, C.: Faces and text attract gaze independent of the task: Experimental data and computer model. Journal of Vision 9(12), 10 (2009)
Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. In: NIPS (2008)
Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Machine Learning 46(1-3), 131–159 (2002)
Chikkerur, S., Serre, T., Tan, C., Poggio, T.: What and where: a bayesian inference theory of attention. Vision Research 50(22), 2233–2247 (2010)
Einhäuser, W., Spain, M., Perona, P.: Objects predict fixations better than early saliency. Journal of Vision 8(14), 18 (2008)
Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: Liblinear: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)
Field, D.J.: What is the goal of sensory coding? Neural Computation 6, 559–601 (1994)
Gao, D., Mahadevan, V., Vasconcelos, N.: The discriminant center-surround hypothesis for bottom-up saliency. In: NIPS (2007)
Garcia-Diaz, A., Fdez-Vidal, X.R., Pardo, X.M., Dosil, R.: Saliency from hierarchical adaptation through decorrelation and variance normalization. Image and Vision Computing 30(1), 51–64 (2012)
Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: NIPS (2007)
Hou, X., Harel, J., Koch, C.: Image signature: Highlighting sparse salient regions. T-PAMI 34(1), 194–201 (2012)
Hyvarinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Networks 13(4-5), 411–430 (2000)
Itti, L., Baldi, P.: Bayesian surprise attracts human attention. In: NIPS (2006)
Itti, L., Koch, C., Niebur, E.: A model for saliency-based visual attention for rapid scene analysis. T-PAMI 20(11), 1254–1259 (1998)
Judd, T.: Learning to predict where humans look, http://people.csail.mit.edu/tjudd/WherePeopleLook/index.html
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV (2009)
Kienzle, W., Wichmann, F., Scholkopf, B., Franz, M.: A nonparametric approach to bottom-up visual saliency. In: NIPS (2006)
Koch, C., Ullman, S.: Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology 4(4), 219–227 (1985)
Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. T-PAMI 36(1), 18–32 (2014)
Mahadevan, V., Vasconcelos, N.: Spatiotemporal saliency in highly dynamic scenes. T-PAMI 32(1), 171–177 (2010)
Mancas, M.: Attention-based dense crowds analysis. In: WIAMIS (2010)
Margolin, R., Tal, A., Zelnik-Manor, L.: What makes a patch distinct? In: CVPR (2013)
Nuthmann, A., Henderson, J.: Object-based attentional selection in scene viewing. Journal of Vision 10(8), 20 (2010)
Ouerhani, N., Von Wartburg, R., Hugli, H., Muri, R.: Empirical validation of the saliency-based model of visual attention. Electronic Letters on Computer Vision and Image Analysis 3(1), 13–24 (2004)
Parkhurst, D., Law, K., Niebur, E.: Modeling the role of salience in the allocation of overt visual attention. Vision Research 42(1), 107–123 (2002)
Peters, R., Iyer, A., Itti, L., Koch, C.: Components of bottom-up gaze allocation in natural images. Vision Research 45(18), 2397–2416 (2005)
Raj, R., Geisler, W., Frazor, R., Bovik, A.: Contrast statistics for foveated visual systems: Fixation selection by minimizing contrast entropy. Journal of the Optical Society of America A 22(10), 2039–2049 (2005)
Rudoy, D., Goldman, D.B., Shechtman, E., Zelnik-Manor, L.: Learning video saliency from human gaze using candidate selection. In: CVPR (2013)
Seo, H., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. Journal of Vision 9(12), 15 (2009)
Tatler, B.W.: The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision 7(14), 4 (2007)
Tatler, B.W., Baddeley, R., Gilchrist, I.: Visual correlates of fixation selection: Effects of scale and time. Vision Research 45(5), 643–659 (2005)
Treisman, A.M., Gelade, G.: A feature-integration theory of attention. Cognitive Psychology 12(1), 97–136 (1980)
Treue, S.: Neural correlates of attention in primate visual cortex. Trends in Neurosciences 24(5), 295–300 (2001)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)
Zhang, L., Tong, M., Marks, T., Shan, H., Cottrell, G.: Sun: A bayesian framework for saliency using natural statistics. Journal of Vision 8(7), 32 (2008)
Zhao, Q., Koch, C.: Learning a saliency map using fixated locations in natural scenes. Journal of Vision 11(3), 9 (2011)
Zhao, Q., Koch, C.: Learning visual saliency by combining feature maps in a nonlinear manner using adaboost. Journal of Vision 12(6), 22 (2012)
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: CVPR (2012)
Xu, J., Jiang, M., Wang, S., Kankanhalli, M.S., Zhao, Q.: Predicting human gaze beyond pixels. Journal of Vision 14(1), 28 (2014)
Jiang, M., Song, M., Zhao, Q.: Leveraging Human Fixations in Sparse Coding: Learning a Discriminative Dictionary for Saliency Prediction. In: SMC (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Jiang, M., Xu, J., Zhao, Q. (2014). Saliency in Crowd. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8695. Springer, Cham. https://doi.org/10.1007/978-3-319-10584-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-10584-0_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10583-3
Online ISBN: 978-3-319-10584-0
eBook Packages: Computer ScienceComputer Science (R0)