Skip to main content

Cross-Category Cross-Semantic Regularization for Fine-Grained Image Recognition

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11857))

Abstract

Fine-grained image recognition (FGIR) is challenging due to the local and subtle differences between subordinate categories. Existing methods adopt a two-step strategy by first detecting local parts from images, and then extracting features from them for classification. Although steady progress has been achieved, these methods localize object parts separately while neglecting the relationships between them. In this paper, we propose cross-category cross-semantic (\(C^{3}S\)), a regularization module that exploits the relationships between object parts from different images to regularize the fine-grained feature learning for FGIR. \(C^{3}S\) encourages the features of the same object part from different images to have strong correlations while decorrelating the features from different object parts as much as possible. \(C^{3}S\) can be incorporated into networks without introducing any extra parameters. Experiments on five benchmark datasets (CUB-200-2011, Stanford Dogs, Stanford Cars, FGVC-Aircraft and NABirds) validate the effectiveness of \(C^{3}S\) and demonstrate its comparable performance to existing methods.

Y. Chen and X. Mo—Equal contributions. The first author is a student.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Branson, S., Horn, G.V., Belongie, S., Perona, P.: Bird species categorization using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952 (2014)

  2. Cai, S., Zuo, W., Zhang, L.: Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: ICCV (2017)

    Google Scholar 

  3. Dubey, A., Gupta, O., Guo, P., Raskar, R., Farrell, R., Naik, N.: Training with confusion for fine-grained visual classification. CoRR (2017)

    Google Scholar 

  4. Dubey, A., Gupta, O., Guo, P., Raskar, R., Farrell, R., Naik, N.: Pairwise confusion for fine-grained visual classification. In: ECCV (2018)

    Google Scholar 

  5. Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR (2017)

    Google Scholar 

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  7. Horn, G.V., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: CVPR (2015)

    Google Scholar 

  8. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)

    Google Scholar 

  9. Huang, G., Liu, Z., Maaten, L.V.D., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)

    Google Scholar 

  10. Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194 (2001)

    Article  Google Scholar 

  11. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)

    Article  Google Scholar 

  12. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NIPS (2015)

    Google Scholar 

  13. Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: CVPR (2011)

    Google Scholar 

  14. Krause, J., Jin, H., Yang, J., Feifei, L.: Fine-grained recognition without part annotations. In: CVPR (2015)

    Google Scholar 

  15. Krause, J., Stark, M., Deng, J., Li, F.F.: 3D object representations for fine-grained categorization. In: 4th IEEE Workshop on 3D Representation and Recognition at ICCV (2013)

    Google Scholar 

  16. Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: NIPS (2010)

    Google Scholar 

  17. Lin, D., Shen, X., Lu, C., Jia, J.: Deep LAC: deep localization, alignment and classification for fine-grained recognition. In: CVPR (2015)

    Google Scholar 

  18. Lin, T., Roychowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: International Conference on Computer Vision, pp. 1449–1457 (2015)

    Google Scholar 

  19. Liu, W., et al.: SSD: single shot multibox detector. In: ECCV (2016)

    Google Scholar 

  20. Liu, X., Xia, T., Wang, J., Yang, Y., Zhou, F., Lin, Y.: Fully convolutional attention networks for fine-grained recognition. arXiv preprint arXiv:1603.06765 (2016)

  21. Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: NIPS (2014)

    Google Scholar 

  22. Moghimi, M., Belongie, S.J., Saberian, M.J., Yang, J., Vasconcelos, N., Li, L.: Boosted convolutional neural networks. In: BMVC (2016)

    Google Scholar 

  23. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)

    Google Scholar 

  24. Olshausen, B.A., Anderson, C.H., Essen, D.C.V.: A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13(11), 4700–4719 (1993)

    Article  Google Scholar 

  25. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)

    Google Scholar 

  26. Simon, M., Gao, Y., Darrell, T., Denzler, J., Rodner, E.: Generalized orderless pooling performs implicit salient matching. In: ICCV (2017)

    Google Scholar 

  27. Sun, M., Yuan, Y., Zhou, F., Ding, E.: Multi-attention multi-class constraint for fine-grained image recognition. In: ECCV (2018)

    Google Scholar 

  28. Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)

    Google Scholar 

  29. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset. Tech. rep. California Institute of Technology (2011)

    Google Scholar 

  30. Wang, D., Shen, Z., Shao, J., Zhang, W., Xue, X., Zhang, Z.: Multiple granularity descriptors for fine-grained categorization. In: ICCV (2015)

    Google Scholar 

  31. Wang, F., et al.: Residual attention network for image classification. In: CVPR (2017)

    Google Scholar 

  32. Welinder, P., et al.: Caltech-UCSD Birds 200. Tech. rep. CNS-TR-2010-001. California Institute of Technology (2010)

    Google Scholar 

  33. Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., Wang, L.: Learning to navigate for fine-grained classification. In: ECCV (2018)

    Google Scholar 

  34. Zhang, N., Donahue, J., Girshick, R.B., Darrell, T.: Part-based R-CNNs for fine-grained category detection. In: European Conference on Computer Vision (2014)

    Google Scholar 

  35. Zhang, X., Xiong, H., Zhou, W., Lin, W., Tian, Q.: Picking deep filter responses for fine-grained image recognition. In: CVPR (2016)

    Google Scholar 

  36. Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)

    Article  Google Scholar 

  37. Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: ICCV (2017)

    Google Scholar 

  38. Zhou, F., Lin, Y.: Fine-grained image classification by exploring bipartite-graph labels. In: CVPR (2016)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61702197, in part by the Natural Science Foundation of Guangdong Province under Grant 2017A030310261, in part by the program of China Scholarship Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Luo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Y., Mo, X., Liang, Z., Wei, T., Luo, W. (2019). Cross-Category Cross-Semantic Regularization for Fine-Grained Image Recognition. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11857. Springer, Cham. https://doi.org/10.1007/978-3-030-31654-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-31654-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-31653-2

  • Online ISBN: 978-3-030-31654-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics