Fine-Grained Image Recognition from Click-Through Logs Using Deep Siamese Network

Feng, Wu; Liu, Dong

doi:10.1007/978-3-319-51811-4_11

Wu Feng¹⁸ &
Dong Liu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10132))

Included in the following conference series:

International Conference on Multimedia Modeling

3347 Accesses
3 Citations

Abstract

Image recognition using deep network models has achieved remarkable progress in recent years. However, fine-grained recognition remains a big challenge due to the lack of large-scale well labeled dataset to train the network. In this paper, we study a deep network based method for fine-grained image recognition by utilizing the click-through logs from search engines. We use both click times and probability values to filter out the noise in click-through logs. Furthermore, we propose a deep siamese network model to fine-tune the classifier, emphasizing the subtle difference between different classes and tolerating the variation within the same class. Our method is evaluated by training with the Bing clickture-dog dataset and testing with the well labeled dog breed dataset. The results demonstrate great improvement achieved by our method compared with naive training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bai, Y., Yang, K., Yu, W., Xu, C., Ma, W.Y., Zhao, T.: Automatic image dataset construction from click-through logs using deep neural network. In: ACM Multimedia, pp. 441–450. ACM (2015)
Google Scholar
Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. Int. J. Pattern Recogn. Artif. Intell. 7(04), 669–688 (1993)
Article Google Scholar
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVPR, vol. 1, pp. 539–546. IEEE (2005)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255. IEEE (2009)
Google Scholar
Dong, J., Li, X., Liao, S., Xu, J., Xu, D., Du, X.: Image retrieval by cross-media relevance fusion. In: ACM Multimedia, pp. 173–176. ACM (2015)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR, vol. 2, pp. 1735–1742. IEEE (2006)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Hua, X.S., Yang, L., Wang, J., Wang, J., Ye, M., Wang, K., Rui, Y., Li, J.: Clickage: towards bridging semantic and intent gaps via mining click logs of search engines. In: ACM Multimedia, pp. 243–252. ACM (2013)
Google Scholar
Hua, X.S., Ye, M., Li, J.: Mining knowledge from clicks: MSR-Bing image retrieval challenge. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2014)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NIPS, pp. 2017–2025 (2015)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia, pp. 675–678. ACM (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
Google Scholar
Li, C., Song, Q., Wang, Y., Song, H., Kang, Q., Cheng, J., Lu, H.: Learning to recognition from bing clickture data. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2016)
Google Scholar
Li, W., Ke, C.: Ensemble deep neural networks for domain-specific image recognition. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2016)
Google Scholar
Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV, pp. 1449–1457 (2015)
Google Scholar
Lu, Y.J., Yang, L., Yang, K., Rui, Y.: Mining latent attributes from click-through logs for image recognition. IEEE Trans. Multimed. 17(8), 1213–1224 (2015)
Article Google Scholar
Ou, X., Wei, Z., Ling, H., Liu, S., Cao, X.: Deep multi-context network for fine-grained visual recognition. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint: arXiv:1409.1556
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Google Scholar
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: CVPR, pp. 1701–1708 (2014)
Google Scholar
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: CVPR, pp. 1386–1393 (2014)
Google Scholar
Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: CVPR, pp. 2691–2699 (2015)
Google Scholar
Xie, G., Yang, K., Bai, Y., Shang, M., Rui, Y., Lai, J.: Improve dog recognition by mining more information from both click-through logs and pre-trained models. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2016)
Google Scholar

Download references

Acknowledgment

This work was supported by the National Program on Key Basic Research Projects (973 Program) under Grant 2015CB351803, by the Natural Science Foundation of China (NSFC) under Grants 61331017 and 61390512, and by the Fundamental Research Funds for the Central Universities under Grants WK2100060011 and WK3490000001.

Author information

Authors and Affiliations

CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, University of Science and Technology of China, Hefei, China
Wu Feng & Dong Liu

Authors

Wu Feng
View author publications
You can also search for this author in PubMed Google Scholar
Dong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Liu .

Editor information

Editors and Affiliations

CNRS–IRISA, Rennes, France
Laurent Amsaleg
Reykjavík University, Reykjavik, Iceland
Gylfi Þór Guðmundsson
Dublin City University, Dublin, Ireland
Cathal Gurrin
Reykjavik University, Reykjavik, Ireland
Björn Þór Jónsson
National Institute of Informatics, Tokyo, Japan
Shin’ichi Satoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, W., Liu, D. (2017). Fine-Grained Image Recognition from Click-Through Logs Using Deep Siamese Network. In: Amsaleg, L., Guðmundsson, G., Gurrin, C., Jónsson, B., Satoh, S. (eds) MultiMedia Modeling. MMM 2017. Lecture Notes in Computer Science(), vol 10132. Springer, Cham. https://doi.org/10.1007/978-3-319-51811-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-51811-4_11
Published: 31 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51810-7
Online ISBN: 978-3-319-51811-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics