Salient Object Subitizing

Zhang, Jianming; Malmberg, Filip; Sclaroff, Stan

doi:10.1007/978-3-030-04831-0_5

Jianming Zhang⁴,
Filip Malmberg⁵ &
Stan Sclaroff⁶

458 Accesses
1 Citations

Abstract

As early as the nineteenth century, it was observed that humans can effortlessly identify the number of items in the range of 1–4 by a glance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.cs.bu.edu/groups/ivc/Subitizing/.
2.
We use the subset of ImageNet images with bounding box annotations.
3.
The F-score is computed as \(\frac {2RP}{(R+P)}\), where R and P denote recall and precision respectively.
4.
When evaluated on the test set used by [200], our best method GoogleNet_Syn_FT achieves a mAP score of 85.0%.
5.
https://stock.adobe.com.

References

Anoraganingrum, D. Cell segmentation with median filter and mathematical morphology operation. In International Conference on Image Analysis and Processing (1999).
Google Scholar
Arteta, C., Lempitsky, V., Noble, J. A., and Zisserman, A. Interactive object counting. In European Conference on Computer Vision (ECCV) (2014).
Google Scholar
Atkinson, J., Campbell, F. W., and Francis, M. R. The magic number 4±0: A new look at visual numerosity judgements. Perception 5, 3 (1976), 327–34.
Article Google Scholar
Berg, T. L., and Berg, A. C. Finding iconic images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2009).
Google Scholar
Borji, A., Sihite, D. N., and Itti, L. Salient object detection: A benchmark. In European Conference on Computer Vision (ECCV) (2012).
Chapter Google Scholar
Boysen, S. T., and Capaldi, E. J. The development of numerical competence: Animal and human models. Psychology Press, 2014.
Google Scholar
Chan, A. B., Liang, Z.-S., and Vasconcelos, N. Privacy preserving crowd monitoring: Counting people without people models or tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008).
Google Scholar
Chan, A. B., and Vasconcelos, N. Bayesian Poisson regression for crowd counting. In IEEE International Conference on Computer Vision (ICCV) (2009).
Google Scholar
Chatfield, K., Lempitsky, V., Vedaldi, A., and Zisserman, A. The devil is in the details: an evaluation of recent feature encoding methods. In British Machine Vision Conference (BMVC) (2011).
Google Scholar
Cheng, M.-M., Mitra, N. J., Huang, X., Torr, P. H. S., and Hu, S.-M. Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 37, 3 (2015), 569–582.
Article Google Scholar
Choi, J., Jung, C., Lee, J., and Kim, C. Determining the existence of objects in an image and its application to image thumbnailing. Signal Processing Letters 21, 8 (2014), 957–961.
Article Google Scholar
Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., and Zheng, Y. NUS-WIDE: A real-world web image database from National University of Singapore. In ACM International Conference on Image and Video Retrieval (2009).
Google Scholar
Clements, D. H. Subitizing: What is it? why teach it? Teaching children mathematics 5 (1999), 400–405.
Google Scholar
Davis, H., and Pérusse, R. Numerical competence in animals: Definitional issues, current evidence, and a new research agenda. Behavioral and Brain Sciences 11, 04 (1988), 561–579.
Article Google Scholar
Dehaene, S. The number sense: How the mind creates mathematics. Oxford University Press, 2011.
Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 32, 9 (2010), 1627–1645.
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T., and Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014).
Google Scholar
Gross, H. J. The magical number four: A biological, historical and mythological enigma. Communicative & integrative biology 5, 1 (2012), 1–2.
Article Google Scholar
Gross, H. J., Pahl, M., Si, A., Zhu, H., Tautz, J., and Zhang, S. Number-based visual generalisation in the honeybee. PLoS One 4, 1 (2009), e4263.
Article Google Scholar
Gurari, D., and Grauman, K. Visual question: Predicting if a crowd will agree on the answer. arXiv preprint arXiv:1608.08188 (2016).
Google Scholar
Heo, J.-P., Lin, Z., and Yoon, S.-E. Distance encoded product quantization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014).
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., and Zisserman, A. Synthetic data and artificial neural networks for natural scene text recognition. In Advances in Neural Information Processing Systems (NIPS) Workshop (2014).
Google Scholar
Jansen, B. R., Hofman, A. D., Straatemeier, M., Bers, B. M., Raijmakers, M. E., and Maas, H. L. The role of pattern recognition in children’s exact enumeration of small numbers. British Journal of Developmental Psychology 32, 2 (2014), 178–194.
Article Google Scholar
Jevons, W. S. The power of numerical discrimination. Nature 3, 67 (1871), 281–282.
Article Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia (2014).
Google Scholar
Kaufman, E., Lord, M., Reese, T., and Volkmann, J. The discrimination of visual number. The American Journal of Psychology (1949), 498–525.
Article Google Scholar
Kazemzadeh, S., Ordonez, V., Matten, M., and Berg, T. L. Referitgame: Referring to objects in photographs of natural scenes. In Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014).
Google Scholar
Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS) (2012).
Google Scholar
Lee, Y. J., Ghosh, J., and Grauman, K. Discovering important people and objects for egocentric video summarization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012).
Google Scholar
Lempitsky, V., and Zisserman, A. Learning to count objects in images. In Advances in Neural Information Processing Systems (NIPS) (2010).
Google Scholar
Li, X., Uricchio, T., Ballan, L., Bertini, M., Snoek, C. G. M., and Bimbo, A. D. Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Computing Surveys 49, 1 (June 2016), 14:1–14:39.
Article Google Scholar
Li, Y., Hou, X., Koch, C., Rehg, J. M., and Yuille, A. L. The secrets of salient object segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014).
Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. Microsoft COCO: Common objects in context. In European Conference on Computer Vision (ECCV) (2014).
Google Scholar
Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., and Shum, H.-Y. Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 33, 2 (2011), 353–367.
Google Scholar
Mandler, G., and Shebo, B. J. Subitizing: an analysis of its component processes. Journal of Experimental Psychology: General 111, 1 (1982), 1.
Article Google Scholar
Nath, S. K., Palaniappan, K., and Bunyak, F. Cell segmentation using coupled level sets and graph-vertex coloring. In Medical Image Computing and Computer-Assisted Intervention (MICCAI) (2006).
Chapter Google Scholar
Pahl, M., Si, A., and Zhang, S. Numerical cognition in bees and other insects. Frontiers in psychology 4 (2013).
Google Scholar
Peng, X., Sun, B., Ali, K., and Saenko, K. Learning deep object detectors from 3d models. In IEEE International Conference on Computer Vision (ICCV) (2015).
Google Scholar
Piazza, M., and Dehaene, S. From number neurons to mental arithmetic: The cognitive neuroscience of number sense. The cognitive neurosciences, 3rd edition (2004), 865–77.
Google Scholar
Pinheiro, P. O., Lin, T.-Y., Collobert, R., and Dollár, P. Learning to refine object segments. In European Conference on Computer Vision (ECCV) (2016).
Chapter Google Scholar
Pont-Tuset, J., Arbelaez, P., Barron, J. T., Marques, F., and Malik, J. Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE transactions on pattern analysis and machine intelligence 39, 1 (2017), 128–140.
Article Google Scholar
Razavian, A. S., Azizpour, H., Sullivan, J., and Carlsson, S. CNN features off-the-shelf: an astounding baseline for recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), DeepVision Workshop (2014).
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. Imagenet large scale visual recognition challenge, 2014.
Google Scholar
Scharfenberger, C., Waslander, S. L., Zelek, J. S., and Clausi, D. A. Existence detection of objects in images for robot vision using saliency histogram features. In International Conference on Computer and Robot Vision (2013).
Google Scholar
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. In International Conference on Learning Representations (ICLR) (2014).
Google Scholar
Shin, D., He, S., Lee, G. M., Whinston, A. B., Cetintas, S., and Lee, K.-C. Content complexity, similarity, and consistency in social media: A deep learning approach. https://ssrn.com/abstract=2830377, 2016.
Simonyan, K., and Zisserman, A. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR) (2015).
Google Scholar
Stark, M., Goesele, M., and Schiele, B. Back to the future: Learning shape models from 3D CAD data. In British Machine Vision Conference (BMVC) (2010).
Google Scholar
Stoianov, I., and Zorzi, M. Emergence of a visual number sense in hierarchical generative models. Nature neuroscience 15, 2 (2012), 194–196.
Article Google Scholar
Subburaman, V. B., Descamps, A., and Carincotte, C. Counting people in the crowd using a generic head detector. In IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS) (2012).
Google Scholar
Sun, B., and Saenko, K. From virtual to reality: Fast adaptation of virtual object detectors to real domains. In British Machine Vision Conference (BMVC) (2014).
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015).
Google Scholar
Torralba, A., Murphy, K. P., Freeman, W. T., and Rubin, M. A. Context-based vision system for place and object recognition. In IEEE International Conference on Computer Vision (ICCV) (2003).
Google Scholar
Trick, L. M., and Pylyshyn, Z. W. Why are small and large numbers enumerated differently? A limited-capacity preattentive stage in vision. Psychological review 101, 1 (1994), 80.
Article Google Scholar
Vedaldi, A., and Fulkerson, B. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/, 2008.
Vuilleumier, P. O., and Rafal, R. D. A systematic study of visual extinction between-and within-field deficits of attention in hemispatial neglect. Brain 123, 6 (2000), 1263–1279.
Article Google Scholar
Wang, P., Wang, J., Zeng, G., Feng, J., Zha, H., and Li, S. Salient object detection for searched web images via global saliency. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012).
Google Scholar
Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., and Torralba, A. Sun database: Large-scale scene recognition from abbey to zoo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010).
Google Scholar
Xiong, B., and Grauman, K. Detecting snap points in egocentric video with a web photo prior. In European Conference on Computer Vision (ECCV) (2014).
Google Scholar
Xu, K., Ba, J., Kiros, R., Courville, A., Salakhutdinov, R., Zemel, R., and Bengio, Y. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044 (2015).
Google Scholar
Zhang, J., Ma, S., Sameki, M., Sclaroff, S., Betke, M., Lin, Z., Shen, X., Price, B., and Měch, R. Salient object subitizing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015).
Google Scholar
Zhang, J., Sclaroff, S., Lin, Z., Shen, X., Price, B., and Měch, R. Unconstrained salient object detection via proposal subset optimization. In IEEE Conference on Computer Vision and Pattern Recognition(CVPR) (2016).
Google Scholar
Zhao, R., Ouyang, W., Li, H., and Wang, X. Saliency detection by multi-context deep learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015).
Google Scholar
Zou, W. Y., and McClelland, J. L. Progressive development of the number sense in a deep neural network. In Annual Conference of the Cognitive Science Society (CogSci) (2013).
Google Scholar

Download references

Author information

Authors and Affiliations

Adobe Inc., San Jose, CA, USA
Jianming Zhang
Centre for Image Analysis, Uppsala University, Uppsala, Uppsala Län, Sweden
Filip Malmberg
Department of Computer Science, Boston University, Boston, MA, USA
Stan Sclaroff

Authors

Jianming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Filip Malmberg
View author publications
You can also search for this author in PubMed Google Scholar
Stan Sclaroff
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, J., Malmberg, F., Sclaroff, S. (2019). Salient Object Subitizing. In: Visual Saliency: From Pixel-Level to Object-Level Analysis. Springer, Cham. https://doi.org/10.1007/978-3-030-04831-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-04831-0_5
Published: 22 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04830-3
Online ISBN: 978-3-030-04831-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics