Advertisement

Neural Computing and Applications

, Volume 31, Issue 12, pp 9295–9305 | Cite as

Context-aware attention network for image recognition

  • Jiaxu Leng
  • Ying LiuEmail author
  • Shang Chen
Original Article
  • 123 Downloads

Abstract

Existing recognition methods based on deep learning have achieved impressive performance. However, most of these algorithms do not fully utilize the contexts and discriminative parts, which limit the recognition performance. In this paper, we propose a context-aware attention network that imitates the human visual attention mechanism. The proposed network mainly consists of a context learning module and an attention transfer module. Firstly, we design the context learning module that carries on contextual information transmission along four directions: left, right, top and down to capture valuable contexts. Second, the attention transfer module is proposed to generate attention maps that contain different attention regions, benefiting for extracting discriminative features. Specially, the attention maps are generated through multiple glimpses. In each glimpse, we generate the corresponding attention map and apply it to the next glimpse. This means that our attention is shifting constantly, and the shift is not random but is closely related to the last attention. Finally, we consider all located attention regions to achieve accurate image recognition. Experimental results show that our method achieves state-of-the-art performance with 97.68% accuracy, 82.42% accuracy, 80.32% accuracy and 86.12% accuracy on CIFAR-10, CIFAR-100, Caltech-256 and CUB-200, respectively.

Keywords

Convolution neural network Context learning Attention transfer 

Notes

Acknowledgements

This project was partially supported by Grants from Natural Science Foundation of China 71671178, 91546201. It was also supported by University of Chinese Academy of Sciences Project Y954016XX2, and by Guangdong Provincial Science and Technology Project 2016B010127004.

Compliance with ethical standards

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

References

  1. 1.
    Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: ECCV. Springer, pp 850–865Google Scholar
  2. 2.
    Nam H, Han B (2016) Learning multi-domain convolutional neural net-506 works for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302Google Scholar
  3. 3.
    Chen LC, Papandreou G, Kokkinos I et al (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848CrossRefGoogle Scholar
  4. 4.
    Girshick RB, Donahue J, Darrell T et al (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158CrossRefGoogle Scholar
  5. 5.
    Redmon J, Farhadi A (2016) YOLO9000: better, faster, stronger. arXiv preprint, 1612Google Scholar
  6. 6.
    Santoro A, Raposo D, Barrett DG et al (2017) A simple neural network module for relational reasoning. In: Advances in neural information processing systems, pp 4974–4983Google Scholar
  7. 7.
    Leng J, Liu Y (2018) An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput Appl.  https://doi.org/10.1007/s00521-018-3486-1 CrossRefGoogle Scholar
  8. 8.
    Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. ICLRGoogle Scholar
  9. 9.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPRGoogle Scholar
  10. 10.
    Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194CrossRefGoogle Scholar
  11. 11.
    Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In: NIPSGoogle Scholar
  12. 12.
    Zhao B, Wu X, Feng J, Peng Q, Yan S (2016) Diversified visual attention networks for fine-grained object classification. arXiv preprint arXiv:1606.08572
  13. 13.
    Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: CVPR, pp 842–850Google Scholar
  14. 14.
    Liu X, Xia T, Wang J, Lin Y (2016) Fully convolutional attention localization networks: efficient attention localization for fine-grained recognition. CoRR arXiv:1603.06765
  15. 15.
    Ji Y, Zhang H, Wu QMJ (2018) Salient object detection via multi-scale attention CNN. Neurocomputing 322:130–140CrossRefGoogle Scholar
  16. 16.
    Zhang H, Ji Y, Huang W et al (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl.  https://doi.org/10.1007/s00521-018-3579-x CrossRefGoogle Scholar
  17. 17.
    Xu K, Ba J, Kiros R et al (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057Google Scholar
  18. 18.
    Chen L, Zhang H, Xiao J et al (2017) SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667Google Scholar
  19. 19.
    Seo PH, Lin Z, Cohen S et al (2016) Progressive attention networks for visual attribute prediction. arXiv preprint arXiv:1606.02393
  20. 20.
    Das D, George Lee CS (2018) Sample-to-sample correspondence for unsupervised domain adaptation. Eng Appl Artif Intell 73:80–91CrossRefGoogle Scholar
  21. 21.
    Das D, George Lee CS (2018) Unsupervised domain adaptation using regularized hyper-graph matching. In: 2018 25th IEEE international conference on image processing (ICIP). IEEEGoogle Scholar
  22. 22.
    Courty N et al (2017) Optimal transport for domain adaptation. IEEE Trans Pattern Anal Mach Intell 39(9):1853–1865CrossRefGoogle Scholar
  23. 23.
    Larochelle H, Hinton GE (2010) Learning to combine foveal glimpses with a third-order Boltzmann machine. In: Advances in neural information processing systems, pp 1243–1251Google Scholar
  24. 24.
    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRefGoogle Scholar
  25. 25.
    Kim JH, Lee SW, Kwak D et al (2016) Multimodal residual learning for visual QA. In: Advances in neural information processing systems, pp 361–369Google Scholar
  26. 26.
    Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1520–1528Google Scholar
  27. 27.
    Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Advances in neural information processing systems, pp 2377–2385Google Scholar
  28. 28.
    Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025Google Scholar
  29. 29.
    Xiao T, Xu Y, Yang K et al (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 842–850Google Scholar
  30. 30.
    Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. CVPR 2:3Google Scholar
  31. 31.
    Wang F et al (2017) Residual attention network for image classification. In: CVPRGoogle Scholar
  32. 32.
    Divvala SK, Hoiem D, Hays JH, Efros AA, Hebert M (2009) An empirical study of context in object detection. In: CVPRGoogle Scholar
  33. 33.
    Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: CVPRGoogle Scholar
  34. 34.
    Uijlings JR, De Sande KE, Gevers T et al (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171CrossRefGoogle Scholar
  35. 35.
    He K, Zhang X, Ren S et al (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 346–361CrossRefGoogle Scholar
  36. 36.
    Girshick RB (2015) Fast R-CNN. In: International conference on computer vision, pp 1440–1448Google Scholar
  37. 37.
    Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of TorontoGoogle Scholar
  38. 38.
    Griffin G, Holub A, Perona P (2007) Caltech-256 object category datasetGoogle Scholar
  39. 39.
    Welinder P, Branson S, Mita T, Wah C, Schroff F, Be-longie S, Perona P (2010) Caltech-UCSD Birds 200. Technical report CNS-TR-2010-001, California Institute of TechnologyGoogle Scholar
  40. 40.
    Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: IEEE CVPR 2004, workshop on generative-model based visionGoogle Scholar
  41. 41.
    Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. In: NIPS, pp 2017–2025Google Scholar
  42. 42.
    Wang F, Jiang M, Qian C et al (2017) Residual attention network for image classification. arXiv preprint arXiv:1704.06904
  43. 43.
    Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR, pp 1409–1556Google Scholar
  44. 44.
    Szegedy C et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyUniversity of Chinese Academy of SciencesBeijingChina
  2. 2.Data Mining and High Performance Computing LabChinese Academy of SciencesBeijingChina
  3. 3.Key Lab of Big Data Mining and Knowledge ManagementChinese Academy of SciencesBeijingChina
  4. 4.School of Information and CommunicationGuilin University of Electronic TechnologyGuilinChina

Personalised recommendations