Abstract
As early as the nineteenth century, it was observed that humans can effortlessly identify the number of items in the range of 1–4 by a glance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
We use the subset of ImageNet images with bounding box annotations.
- 3.
The F-score is computed as \(\frac {2RP}{(R+P)}\), where R and P denote recall and precision respectively.
- 4.
When evaluated on the test set used by [200], our best method GoogleNet_Syn_FT achieves a mAP score of 85.0%.
- 5.
References
Anoraganingrum, D. Cell segmentation with median filter and mathematical morphology operation. In International Conference on Image Analysis and Processing (1999).
Arteta, C., Lempitsky, V., Noble, J. A., and Zisserman, A. Interactive object counting. In European Conference on Computer Vision (ECCV) (2014).
Atkinson, J., Campbell, F. W., and Francis, M. R. The magic number 4±0: A new look at visual numerosity judgements. Perception 5, 3 (1976), 327–34.
Berg, T. L., and Berg, A. C. Finding iconic images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2009).
Borji, A., Sihite, D. N., and Itti, L. Salient object detection: A benchmark. In European Conference on Computer Vision (ECCV) (2012).
Boysen, S. T., and Capaldi, E. J. The development of numerical competence: Animal and human models. Psychology Press, 2014.
Chan, A. B., Liang, Z.-S., and Vasconcelos, N. Privacy preserving crowd monitoring: Counting people without people models or tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008).
Chan, A. B., and Vasconcelos, N. Bayesian Poisson regression for crowd counting. In IEEE International Conference on Computer Vision (ICCV) (2009).
Chatfield, K., Lempitsky, V., Vedaldi, A., and Zisserman, A. The devil is in the details: an evaluation of recent feature encoding methods. In British Machine Vision Conference (BMVC) (2011).
Cheng, M.-M., Mitra, N. J., Huang, X., Torr, P. H. S., and Hu, S.-M. Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 37, 3 (2015), 569–582.
Choi, J., Jung, C., Lee, J., and Kim, C. Determining the existence of objects in an image and its application to image thumbnailing. Signal Processing Letters 21, 8 (2014), 957–961.
Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., and Zheng, Y. NUS-WIDE: A real-world web image database from National University of Singapore. In ACM International Conference on Image and Video Retrieval (2009).
Clements, D. H. Subitizing: What is it? why teach it? Teaching children mathematics 5 (1999), 400–405.
Davis, H., and Pérusse, R. Numerical competence in animals: Definitional issues, current evidence, and a new research agenda. Behavioral and Brain Sciences 11, 04 (1988), 561–579.
Dehaene, S. The number sense: How the mind creates mathematics. Oxford University Press, 2011.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 32, 9 (2010), 1627–1645.
Girshick, R., Donahue, J., Darrell, T., and Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014).
Gross, H. J. The magical number four: A biological, historical and mythological enigma. Communicative & integrative biology 5, 1 (2012), 1–2.
Gross, H. J., Pahl, M., Si, A., Zhu, H., Tautz, J., and Zhang, S. Number-based visual generalisation in the honeybee. PLoS One 4, 1 (2009), e4263.
Gurari, D., and Grauman, K. Visual question: Predicting if a crowd will agree on the answer. arXiv preprint arXiv:1608.08188 (2016).
Heo, J.-P., Lin, Z., and Yoon, S.-E. Distance encoded product quantization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014).
Jaderberg, M., Simonyan, K., Vedaldi, A., and Zisserman, A. Synthetic data and artificial neural networks for natural scene text recognition. In Advances in Neural Information Processing Systems (NIPS) Workshop (2014).
Jansen, B. R., Hofman, A. D., Straatemeier, M., Bers, B. M., Raijmakers, M. E., and Maas, H. L. The role of pattern recognition in children’s exact enumeration of small numbers. British Journal of Developmental Psychology 32, 2 (2014), 178–194.
Jevons, W. S. The power of numerical discrimination. Nature 3, 67 (1871), 281–282.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia (2014).
Kaufman, E., Lord, M., Reese, T., and Volkmann, J. The discrimination of visual number. The American Journal of Psychology (1949), 498–525.
Kazemzadeh, S., Ordonez, V., Matten, M., and Berg, T. L. Referitgame: Referring to objects in photographs of natural scenes. In Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014).
Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS) (2012).
Lee, Y. J., Ghosh, J., and Grauman, K. Discovering important people and objects for egocentric video summarization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012).
Lempitsky, V., and Zisserman, A. Learning to count objects in images. In Advances in Neural Information Processing Systems (NIPS) (2010).
Li, X., Uricchio, T., Ballan, L., Bertini, M., Snoek, C. G. M., and Bimbo, A. D. Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Computing Surveys 49, 1 (June 2016), 14:1–14:39.
Li, Y., Hou, X., Koch, C., Rehg, J. M., and Yuille, A. L. The secrets of salient object segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014).
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. Microsoft COCO: Common objects in context. In European Conference on Computer Vision (ECCV) (2014).
Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., and Shum, H.-Y. Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 33, 2 (2011), 353–367.
Mandler, G., and Shebo, B. J. Subitizing: an analysis of its component processes. Journal of Experimental Psychology: General 111, 1 (1982), 1.
Nath, S. K., Palaniappan, K., and Bunyak, F. Cell segmentation using coupled level sets and graph-vertex coloring. In Medical Image Computing and Computer-Assisted Intervention (MICCAI) (2006).
Pahl, M., Si, A., and Zhang, S. Numerical cognition in bees and other insects. Frontiers in psychology 4 (2013).
Peng, X., Sun, B., Ali, K., and Saenko, K. Learning deep object detectors from 3d models. In IEEE International Conference on Computer Vision (ICCV) (2015).
Piazza, M., and Dehaene, S. From number neurons to mental arithmetic: The cognitive neuroscience of number sense. The cognitive neurosciences, 3rd edition (2004), 865–77.
Pinheiro, P. O., Lin, T.-Y., Collobert, R., and Dollár, P. Learning to refine object segments. In European Conference on Computer Vision (ECCV) (2016).
Pont-Tuset, J., Arbelaez, P., Barron, J. T., Marques, F., and Malik, J. Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE transactions on pattern analysis and machine intelligence 39, 1 (2017), 128–140.
Razavian, A. S., Azizpour, H., Sullivan, J., and Carlsson, S. CNN features off-the-shelf: an astounding baseline for recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), DeepVision Workshop (2014).
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. Imagenet large scale visual recognition challenge, 2014.
Scharfenberger, C., Waslander, S. L., Zelek, J. S., and Clausi, D. A. Existence detection of objects in images for robot vision using saliency histogram features. In International Conference on Computer and Robot Vision (2013).
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. In International Conference on Learning Representations (ICLR) (2014).
Shin, D., He, S., Lee, G. M., Whinston, A. B., Cetintas, S., and Lee, K.-C. Content complexity, similarity, and consistency in social media: A deep learning approach. https://ssrn.com/abstract=2830377, 2016.
Simonyan, K., and Zisserman, A. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR) (2015).
Stark, M., Goesele, M., and Schiele, B. Back to the future: Learning shape models from 3D CAD data. In British Machine Vision Conference (BMVC) (2010).
Stoianov, I., and Zorzi, M. Emergence of a visual number sense in hierarchical generative models. Nature neuroscience 15, 2 (2012), 194–196.
Subburaman, V. B., Descamps, A., and Carincotte, C. Counting people in the crowd using a generic head detector. In IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS) (2012).
Sun, B., and Saenko, K. From virtual to reality: Fast adaptation of virtual object detectors to real domains. In British Machine Vision Conference (BMVC) (2014).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015).
Torralba, A., Murphy, K. P., Freeman, W. T., and Rubin, M. A. Context-based vision system for place and object recognition. In IEEE International Conference on Computer Vision (ICCV) (2003).
Trick, L. M., and Pylyshyn, Z. W. Why are small and large numbers enumerated differently? A limited-capacity preattentive stage in vision. Psychological review 101, 1 (1994), 80.
Vedaldi, A., and Fulkerson, B. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/, 2008.
Vuilleumier, P. O., and Rafal, R. D. A systematic study of visual extinction between-and within-field deficits of attention in hemispatial neglect. Brain 123, 6 (2000), 1263–1279.
Wang, P., Wang, J., Zeng, G., Feng, J., Zha, H., and Li, S. Salient object detection for searched web images via global saliency. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012).
Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., and Torralba, A. Sun database: Large-scale scene recognition from abbey to zoo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010).
Xiong, B., and Grauman, K. Detecting snap points in egocentric video with a web photo prior. In European Conference on Computer Vision (ECCV) (2014).
Xu, K., Ba, J., Kiros, R., Courville, A., Salakhutdinov, R., Zemel, R., and Bengio, Y. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044 (2015).
Zhang, J., Ma, S., Sameki, M., Sclaroff, S., Betke, M., Lin, Z., Shen, X., Price, B., and Měch, R. Salient object subitizing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015).
Zhang, J., Sclaroff, S., Lin, Z., Shen, X., Price, B., and Měch, R. Unconstrained salient object detection via proposal subset optimization. In IEEE Conference on Computer Vision and Pattern Recognition(CVPR) (2016).
Zhao, R., Ouyang, W., Li, H., and Wang, X. Saliency detection by multi-context deep learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015).
Zou, W. Y., and McClelland, J. L. Progressive development of the number sense in a deep neural network. In Annual Conference of the Cognitive Science Society (CogSci) (2013).
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Zhang, J., Malmberg, F., Sclaroff, S. (2019). Salient Object Subitizing. In: Visual Saliency: From Pixel-Level to Object-Level Analysis. Springer, Cham. https://doi.org/10.1007/978-3-030-04831-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-04831-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04830-3
Online ISBN: 978-3-030-04831-0
eBook Packages: Computer ScienceComputer Science (R0)