Skip to main content

Glance and Glimpse Network: A Stochastic Attention Model Driven by Class Saliency

  • Conference paper
  • First Online:
  • 3113 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10118))

Abstract

We present a hybrid model named Glance and Glimpse Network (GGNet) for visual classification, which includes an attention-based recurrent neural network (Glimpse Network) and a convolutional neural network (Glance Network). The Glimpse Network is trained to deploy a sequence of glimpses at different image patches and then output classification results. On the other hand, the Glance Network is designed to take the downsampled input image and generates an image-specific class saliency map to provide hints for training the Glimpse Network. We show that training the Glimpse network with such cues can be interpreted under both frameworks of probabilistic inference and reinforcement learning, therefore establishing high-level connections between these two separate fields. We evaluate the performance of our model on Cluttered Translated MNIST benchmark datasets and show that the GGNet can achieve the state-of-the-art results compared to other recently proposed attention models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)

    Google Scholar 

  2. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  3. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  4. Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems (NIPS), pp. 2204–2212 (2014)

    Google Scholar 

  5. Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. In: International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  6. Ba, J., Salakhutdinov, R.R., Grosse, R.B., Frey, B.J.: Learning wake-sleep recurrent attention models. In: Advances in Neural Information Processing Systems (NIPS), pp. 2575–2583 (2015)

    Google Scholar 

  7. Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: a recurrent neural network for image generation. In: International Conference on Machine Learning (ICML), pp. 1462–1471 (2015)

    Google Scholar 

  8. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning (ICML), pp. 2048–2057 (2015)

    Google Scholar 

  9. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2008–2016 (2015)

    Google Scholar 

  10. Moczulski, M., Xu, K., Courville, A., Cho, K.: A controller recognizer framework: how necessary is recognition for control? In: International Conference for Learning Representations (ICLR) Workshops (2016)

    Google Scholar 

  11. Chun, M.M., Jiang, Y.: Contextual cueing: implicit learning and memory of visual context guides spatial attention. Cogn. Psychol. 36, 28–71 (1998)

    Article  Google Scholar 

  12. Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: International Conference on Computer Vision (ICCV), pp. 2106–2113 (2009)

    Google Scholar 

  13. He, H., Ge, S.S., Zhang, Z.: A saliency-driven robotic head with bio-inspired saccadic behaviors for social robotics. Auton. Robot. 36, 225–240 (2014)

    Article  Google Scholar 

  14. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: International Conference on Learning Representations (ICLR) Workshops (2014)

    Google Scholar 

  15. Melchers, R.: Importance sampling in structural systems. Struct. Saf. 6, 3–10 (1989)

    Article  Google Scholar 

  16. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)

    MATH  Google Scholar 

  17. Weber, T., Heess, N., Eslami, A., Schulman, J., Wingate, D., Silver, D.: Reinforced variational inference. In: Advances in Neural Information Processing Systems (NIPS) Workshops (2015)

    Google Scholar 

  18. Tang, Y., Salakhutdinov, R.R.: Learning stochastic feedforward neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 530–538 (2013)

    Google Scholar 

  19. Mnih, A., Rezende, D.J.: Variational inference for monte carlo objectives. In: International Conference on Machine Learning (ICML) (2016)

    Google Scholar 

  20. Mnih, A., Gregor, K.: Neural variational inference and learning in belief networks. In: International Conference on Machine Learning (ICML) (2014)

    Google Scholar 

  21. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgement

This work is supported by the A*STAR Industrial Robotics Program of Singapore, under grant number R-261-506-007-305.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuzhi Sam Ge .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Li, M., Ge, S.S., Lee, T.H. (2017). Glance and Glimpse Network: A Stochastic Attention Model Driven by Class Saliency. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10118. Springer, Cham. https://doi.org/10.1007/978-3-319-54526-4_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54526-4_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54525-7

  • Online ISBN: 978-3-319-54526-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics