Subitizing with Variational Autoencoders

  • Rijnder WeverEmail author
  • Tom F. H. Runia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11131)


Numerosity, the number of objects in a set, is a basic property of a given visual scene. Many animals develop the perceptual ability to subitize: the near-instantaneous identification of the numerosity in small sets of visual items. In computer vision, it has been shown that numerosity emerges as a statistical property in neural networks during unsupervised learning from simple synthetic images. In this work, we focus on more complex natural images using unsupervised hierarchical neural networks. Specifically, we show that variational autoencoders are able to spontaneously perform subitizing after training without supervision on a large amount of images from the Salient Object Subitizing dataset. While our method is unable to outperform supervised convolutional networks for subitizing, we observe that the networks learn to encode numerosity as a basic visual property. Moreover, we find that the learned representations are likely invariant to object area; an observation in alignment with studies on biological neural networks in cognitive neuroscience.


Object counting Numerosity Variational autoencoders 



The authors would like to thank the Intelligent Sensory Information Systems Institute and the Informatics Institute of the University of Amsterdam for their financial contribution to the travel expenses.


  1. 1.
    Arteta, C., Lempitsky, V., Zisserman, A.: Counting in the wild. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 483–498. Springer, Cham (2016). Scholar
  2. 2.
    Burr, D., Ross, J.: A visual sense of number. Curr. Biol. 18(6), 425–428 (2008)CrossRefGoogle Scholar
  3. 3.
    Chatfield, K., Lempitsky, V.S., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011)Google Scholar
  4. 4.
    Chattopadhyay, P., Vedantam, R., Selvaraju, R.R., Batra, D., Parikh, D.: Counting everyday objects in everyday scenes. In: CVPR (2017)Google Scholar
  5. 5.
    Cheng, M.-M., Mitra, N.J., Huang, X., Torr, P.H., Hu, S.-M.: Global contrast based salient region detection. PAMI 37(3), 569–582 (2015)CrossRefGoogle Scholar
  6. 6.
    Davis, H., Pérusse, R.: Numerical competence in animals: definitional issues, current evidence, and a new research agenda. Behav. Brain Sci. 11(4), 561–579 (1988)CrossRefGoogle Scholar
  7. 7.
    Dehaene, S.: The Number Sense: How the Mind Creates Mathematics. OUP, New York (2011)zbMATHGoogle Scholar
  8. 8.
    Doersch, C.: Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908 (2016)
  9. 9.
    Feigenson, L., Dehaene, S., Spelke, E.: Core systems of number. Trends Cognit. Sci. 8(7), 307–314 (2004)CrossRefGoogle Scholar
  10. 10.
    Franka, M.C., Everettb, D.L., Fedorenkoa, E., Gibsona, E.: Number as a cognitive technology: evidence from pirahã language and cognition. Cognition 108, 819–824 (2008)CrossRefGoogle Scholar
  11. 11.
    Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
  12. 12.
    Harvey, B.M., Klein, B.P., Petridou, N., Dumoulin, S.O.: Topographic representation of numerosity in the human parietal cortex. Science 341(6150), 1123–1126 (2013)CrossRefGoogle Scholar
  13. 13.
    He, S., Jiao, J., Zhang, X., Han, G., Lau, R.W.: Delving into salient object subitizing and detection. In: ICCV (2017)Google Scholar
  14. 14.
    Hou, X., Shen, L., Sun, K., Qiu, G.: Deep feature consistent variational autoencoder. In: WACV (2017)Google Scholar
  15. 15.
    Hu, Y., Chang, H., Nian, F., Wang, Y., Li, T.: Dense crowd counting from still images with convolutional neural networks. J. Vis. Commun. Image Represent. 38, 530–539 (2016)CrossRefGoogle Scholar
  16. 16.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  17. 17.
    Jansen, B.R., Hofman, A.D., Straatemeier, M., Bers, B.M., Raijmakers, M.E., Maas, H.L.: The role of pattern recognition in children’s exact enumeration of small numbers. Br. J. Dev. Psychol. 32(2), 178–194 (2014)CrossRefGoogle Scholar
  18. 18.
    Jevons, W.S.: The power of numerical discrimination. Nature 3, 281–282 (1871)CrossRefGoogle Scholar
  19. 19.
    Kaufman, E.L., Lord, M.W., Reese, T.W., Volkmann, J.: The discrimination of visual number. Am. J. Psychol. 62(4), 498–525 (1949)CrossRefGoogle Scholar
  20. 20.
    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)Google Scholar
  21. 21.
    Lakoff, G., Núñez, R.E.: Where mathematics comes from: how the embodied mind brings mathematics into being. AMC 10, 12 (2000)zbMATHGoogle Scholar
  22. 22.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRefGoogle Scholar
  23. 23.
    Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. JMLR 18(17), 1–5 (2017)zbMATHGoogle Scholar
  24. 24.
    Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: NIPS (2010)Google Scholar
  25. 25.
    Levy, O., Wolf, L.: Live repetition counting. In: ICCV (2015)Google Scholar
  26. 26.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  27. 27.
    Liu, X., Wang, Z., Feng, J., Xi, H.: Highway vehicle counting in compressed domain. In: CVPR (2016)Google Scholar
  28. 28.
    Nieder, A.: The neuronal code for number. Nat. Rev. Neurosci. 17(6), 366–382 (2016)CrossRefGoogle Scholar
  29. 29.
    Nieder, A., Dehaene, S.: Representation of number in the brain. Ann. Rev. Neurosci. 32, 185–208 (2009)CrossRefGoogle Scholar
  30. 30.
    Noroozi, M., Pirsiavash, H., Favaro, P.: Representation learning by learning to count. In: ICCV (2017)Google Scholar
  31. 31.
    Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS Workshops (2017)Google Scholar
  32. 32.
    Piazza, M., Izard, V.: How humans count: numerosity and the parietal cortex. Neuroscientist 15(3), 261–273 (2009)CrossRefGoogle Scholar
  33. 33.
    Poncet, M., Caramazza, A., Mazza, V.: Individuation of objects and object parts rely on the same neuronal mechanism. Sci. Rep. 6, 38434 (2016)CrossRefGoogle Scholar
  34. 34.
    Runia, T.F.H., Snoek, C.G.M., Smeulders, A.W.M.: Real-world repetition estimation by div, grad and curl. In: CVPR, June 2018Google Scholar
  35. 35.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Seguí, S., Pujol, O., Vitria, J.: Learning to count with deep object features. In: CVPR Workshops (2015)Google Scholar
  37. 37.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  38. 38.
    Stoianov, I., Zorzi, M.: Emergence of a “visual number sense” in hierarchical generative models. Nat. Neurosci. 15(2), 194 (2012)CrossRefGoogle Scholar
  39. 39.
    Torralba, A., et al.: Context-based vision system for place and object recognition. In: ICCV (2003)Google Scholar
  40. 40.
    Viswanathan, P., Nieder, A.: Neuronal correlates of a visual “sense of number” in primate parietal and prefrontal cortices. Proc. Natl. Acad. Sci. 110(27), 11187–11192 (2013)CrossRefGoogle Scholar
  41. 41.
    Walach, E., Wolf, L.: Learning to count with CNN boosting. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 660–676. Springer, Cham (2016). Scholar
  42. 42.
    Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: CVPR (2010)Google Scholar
  43. 43.
    Xiong, F., Shi, X., Yeung, D.-Y.: Spatiotemporal modeling for crowd counting in videos. In: ICCV (2017)Google Scholar
  44. 44.
    Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: CVPR (2010)Google Scholar
  45. 45.
    Zhang, J., et al.: Salient object subitizing. IJCV 124(2), 169–186 (2017)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Intelligent Sensory Information SystemsUniversity of AmsterdamAmsterdamNetherlands

Personalised recommendations