Glance and Glimpse Network: A Stochastic Attention Model Driven by Class Saliency

Li, Mingming; Ge, Shuzhi Sam; Lee, Tong Heng

doi:10.1007/978-3-319-54526-4_42

Glance and Glimpse Network: A Stochastic Attention Model Driven by Class Saliency

Mingming Li¹⁶,
Shuzhi Sam Ge¹⁶ &
Tong Heng Lee¹⁶

Conference paper
First Online: 16 March 2017

3113 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10118))

Abstract

We present a hybrid model named Glance and Glimpse Network (GGNet) for visual classification, which includes an attention-based recurrent neural network (Glimpse Network) and a convolutional neural network (Glance Network). The Glimpse Network is trained to deploy a sequence of glimpses at different image patches and then output classification results. On the other hand, the Glance Network is designed to take the downsampled input image and generates an image-specific class saliency map to provide hints for training the Glimpse Network. We show that training the Glimpse network with such cues can be interpreted under both frameworks of probabilistic inference and reinforcement learning, therefore establishing high-level connections between these two separate fields. We evaluate the performance of our model on Cluttered Translated MNIST benchmark datasets and show that the GGNet can achieve the state-of-the-art results compared to other recently proposed attention models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems (NIPS), pp. 2204–2212 (2014)
Google Scholar
Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Ba, J., Salakhutdinov, R.R., Grosse, R.B., Frey, B.J.: Learning wake-sleep recurrent attention models. In: Advances in Neural Information Processing Systems (NIPS), pp. 2575–2583 (2015)
Google Scholar
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: a recurrent neural network for image generation. In: International Conference on Machine Learning (ICML), pp. 1462–1471 (2015)
Google Scholar
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning (ICML), pp. 2048–2057 (2015)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2008–2016 (2015)
Google Scholar
Moczulski, M., Xu, K., Courville, A., Cho, K.: A controller recognizer framework: how necessary is recognition for control? In: International Conference for Learning Representations (ICLR) Workshops (2016)
Google Scholar
Chun, M.M., Jiang, Y.: Contextual cueing: implicit learning and memory of visual context guides spatial attention. Cogn. Psychol. 36, 28–71 (1998)
Article Google Scholar
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: International Conference on Computer Vision (ICCV), pp. 2106–2113 (2009)
Google Scholar
He, H., Ge, S.S., Zhang, Z.: A saliency-driven robotic head with bio-inspired saccadic behaviors for social robotics. Auton. Robot. 36, 225–240 (2014)
Article Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: International Conference on Learning Representations (ICLR) Workshops (2014)
Google Scholar
Melchers, R.: Importance sampling in structural systems. Struct. Saf. 6, 3–10 (1989)
Article Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)
MATH Google Scholar
Weber, T., Heess, N., Eslami, A., Schulman, J., Wingate, D., Silver, D.: Reinforced variational inference. In: Advances in Neural Information Processing Systems (NIPS) Workshops (2015)
Google Scholar
Tang, Y., Salakhutdinov, R.R.: Learning stochastic feedforward neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 530–538 (2013)
Google Scholar
Mnih, A., Rezende, D.J.: Variational inference for monte carlo objectives. In: International Conference on Machine Learning (ICML) (2016)
Google Scholar
Mnih, A., Gregor, K.: Neural variational inference and learning in belief networks. In: International Conference on Machine Learning (ICML) (2014)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar

Download references

Acknowledgement

This work is supported by the A*STAR Industrial Robotics Program of Singapore, under grant number R-261-506-007-305.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, The Social Robotics Lab, Interactive and Digital Media Institute (IDMI), National University of Singapore, Singapore, 117576, Singapore
Mingming Li, Shuzhi Sam Ge & Tong Heng Lee

Authors

Mingming Li
View author publications
You can also search for this author in PubMed Google Scholar
Shuzhi Sam Ge
View author publications
You can also search for this author in PubMed Google Scholar
Tong Heng Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuzhi Sam Ge .

Editor information

Editors and Affiliations

Institute of Information Science, Academia Sinica, Taipei, Taiwan
Chu-Song Chen
Tsinghua University , Beijing, China
Jiwen Lu
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Kai-Kuang Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M., Ge, S.S., Lee, T.H. (2017). Glance and Glimpse Network: A Stochastic Attention Model Driven by Class Saliency. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10118. Springer, Cham. https://doi.org/10.1007/978-3-319-54526-4_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-54526-4_42
Published: 16 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54525-7
Online ISBN: 978-3-319-54526-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics