Cross-Category Cross-Semantic Regularization for Fine-Grained Image Recognition

Chen, Yelin; Mo, Xianjie; Liang, Zijun; Wei, Tingting; Luo, Wei

doi:10.1007/978-3-030-31654-9_10

Yelin Chen¹⁶,
Xianjie Mo¹⁶,
Zijun Liang¹⁶,
Tingting Wei¹⁶ &
…
Wei Luo¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11857))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2540 Accesses
1 Citations

Abstract

Fine-grained image recognition (FGIR) is challenging due to the local and subtle differences between subordinate categories. Existing methods adopt a two-step strategy by first detecting local parts from images, and then extracting features from them for classification. Although steady progress has been achieved, these methods localize object parts separately while neglecting the relationships between them. In this paper, we propose cross-category cross-semantic (\(C^{3}S\)), a regularization module that exploits the relationships between object parts from different images to regularize the fine-grained feature learning for FGIR. \(C^{3}S\) encourages the features of the same object part from different images to have strong correlations while decorrelating the features from different object parts as much as possible. \(C^{3}S\) can be incorporated into networks without introducing any extra parameters. Experiments on five benchmark datasets (CUB-200-2011, Stanford Dogs, Stanford Cars, FGVC-Aircraft and NABirds) validate the effectiveness of \(C^{3}S\) and demonstrate its comparable performance to existing methods.

Y. Chen and X. Mo—Equal contributions. The first author is a student.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Branson, S., Horn, G.V., Belongie, S., Perona, P.: Bird species categorization using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952 (2014)
Cai, S., Zuo, W., Zhang, L.: Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: ICCV (2017)
Google Scholar
Dubey, A., Gupta, O., Guo, P., Raskar, R., Farrell, R., Naik, N.: Training with confusion for fine-grained visual classification. CoRR (2017)
Google Scholar
Dubey, A., Gupta, O., Guo, P., Raskar, R., Farrell, R., Naik, N.: Pairwise confusion for fine-grained visual classification. In: ECCV (2018)
Google Scholar
Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Horn, G.V., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: CVPR (2015)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)
Google Scholar
Huang, G., Liu, Z., Maaten, L.V.D., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)
Google Scholar
Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194 (2001)
Article Google Scholar
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Article Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NIPS (2015)
Google Scholar
Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: CVPR (2011)
Google Scholar
Krause, J., Jin, H., Yang, J., Feifei, L.: Fine-grained recognition without part annotations. In: CVPR (2015)
Google Scholar
Krause, J., Stark, M., Deng, J., Li, F.F.: 3D object representations for fine-grained categorization. In: 4th IEEE Workshop on 3D Representation and Recognition at ICCV (2013)
Google Scholar
Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: NIPS (2010)
Google Scholar
Lin, D., Shen, X., Lu, C., Jia, J.: Deep LAC: deep localization, alignment and classification for fine-grained recognition. In: CVPR (2015)
Google Scholar
Lin, T., Roychowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: International Conference on Computer Vision, pp. 1449–1457 (2015)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: ECCV (2016)
Google Scholar
Liu, X., Xia, T., Wang, J., Yang, Y., Zhou, F., Lin, Y.: Fully convolutional attention networks for fine-grained recognition. arXiv preprint arXiv:1603.06765 (2016)
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: NIPS (2014)
Google Scholar
Moghimi, M., Belongie, S.J., Saberian, M.J., Yang, J., Vasconcelos, N., Li, L.: Boosted convolutional neural networks. In: BMVC (2016)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)
Google Scholar
Olshausen, B.A., Anderson, C.H., Essen, D.C.V.: A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13(11), 4700–4719 (1993)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)
Google Scholar
Simon, M., Gao, Y., Darrell, T., Denzler, J., Rodner, E.: Generalized orderless pooling performs implicit salient matching. In: ICCV (2017)
Google Scholar
Sun, M., Yuan, Y., Zhou, F., Ding, E.: Multi-attention multi-class constraint for fine-grained image recognition. In: ECCV (2018)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset. Tech. rep. California Institute of Technology (2011)
Google Scholar
Wang, D., Shen, Z., Shao, J., Zhang, W., Xue, X., Zhang, Z.: Multiple granularity descriptors for fine-grained categorization. In: ICCV (2015)
Google Scholar
Wang, F., et al.: Residual attention network for image classification. In: CVPR (2017)
Google Scholar
Welinder, P., et al.: Caltech-UCSD Birds 200. Tech. rep. CNS-TR-2010-001. California Institute of Technology (2010)
Google Scholar
Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., Wang, L.: Learning to navigate for fine-grained classification. In: ECCV (2018)
Google Scholar
Zhang, N., Donahue, J., Girshick, R.B., Darrell, T.: Part-based R-CNNs for fine-grained category detection. In: European Conference on Computer Vision (2014)
Google Scholar
Zhang, X., Xiong, H., Zhou, W., Lin, W., Tian, Q.: Picking deep filter responses for fine-grained image recognition. In: CVPR (2016)
Google Scholar
Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)
Article Google Scholar
Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: ICCV (2017)
Google Scholar
Zhou, F., Lin, Y.: Fine-grained image classification by exploring bipartite-graph labels. In: CVPR (2016)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61702197, in part by the Natural Science Foundation of Guangdong Province under Grant 2017A030310261, in part by the program of China Scholarship Council.

Author information

Authors and Affiliations

South China Agricultural University, Guangzhou, 510000, GD, People’s Republic of China
Yelin Chen, Xianjie Mo, Zijun Liang, Tingting Wei & Wei Luo

Authors

Yelin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xianjie Mo
View author publications
You can also search for this author in PubMed Google Scholar
Zijun Liang
View author publications
You can also search for this author in PubMed Google Scholar
Tingting Wei
View author publications
You can also search for this author in PubMed Google Scholar
Wei Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Luo .

Editor information

Editors and Affiliations

School of EECS, Peking University, Beijing, China
Zhouchen Lin
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Liang Wang
Nanjing University, Nanjing, Jiangsu, China
Jian Yang
Xidian University, Xi'an, China
Guangming Shi
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Institute of Artificial Intelligence, Xi'an Jiaotong University, Xi'an, Shaanxi, China
Nanning Zheng
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Northwestern Polytechnical University, Xi'an, China
Yanning Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Mo, X., Liang, Z., Wei, T., Luo, W. (2019). Cross-Category Cross-Semantic Regularization for Fine-Grained Image Recognition. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11857. Springer, Cham. https://doi.org/10.1007/978-3-030-31654-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-31654-9_10
Published: 31 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31653-2
Online ISBN: 978-3-030-31654-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics