Skip to main content

Fine-Grained Image Recognition from Click-Through Logs Using Deep Siamese Network

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10132))

Included in the following conference series:

Abstract

Image recognition using deep network models has achieved remarkable progress in recent years. However, fine-grained recognition remains a big challenge due to the lack of large-scale well labeled dataset to train the network. In this paper, we study a deep network based method for fine-grained image recognition by utilizing the click-through logs from search engines. We use both click times and probability values to filter out the noise in click-through logs. Furthermore, we propose a deep siamese network model to fine-tune the classifier, emphasizing the subtle difference between different classes and tolerating the variation within the same class. Our method is evaluated by training with the Bing clickture-dog dataset and testing with the well labeled dog breed dataset. The results demonstrate great improvement achieved by our method compared with naive training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bai, Y., Yang, K., Yu, W., Xu, C., Ma, W.Y., Zhao, T.: Automatic image dataset construction from click-through logs using deep neural network. In: ACM Multimedia, pp. 441–450. ACM (2015)

    Google Scholar 

  2. Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. Int. J. Pattern Recogn. Artif. Intell. 7(04), 669–688 (1993)

    Article  Google Scholar 

  3. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVPR, vol. 1, pp. 539–546. IEEE (2005)

    Google Scholar 

  4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255. IEEE (2009)

    Google Scholar 

  5. Dong, J., Li, X., Liao, S., Xu, J., Xu, D., Du, X.: Image retrieval by cross-media relevance fusion. In: ACM Multimedia, pp. 173–176. ACM (2015)

    Google Scholar 

  6. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR, vol. 2, pp. 1735–1742. IEEE (2006)

    Google Scholar 

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  8. Hua, X.S., Yang, L., Wang, J., Wang, J., Ye, M., Wang, K., Rui, Y., Li, J.: Clickage: towards bridging semantic and intent gaps via mining click logs of search engines. In: ACM Multimedia, pp. 243–252. ACM (2013)

    Google Scholar 

  9. Hua, X.S., Ye, M., Li, J.: Mining knowledge from clicks: MSR-Bing image retrieval challenge. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2014)

    Google Scholar 

  10. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NIPS, pp. 2017–2025 (2015)

    Google Scholar 

  11. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia, pp. 675–678. ACM (2014)

    Google Scholar 

  12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)

    Google Scholar 

  13. Li, C., Song, Q., Wang, Y., Song, H., Kang, Q., Cheng, J., Lu, H.: Learning to recognition from bing clickture data. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2016)

    Google Scholar 

  14. Li, W., Ke, C.: Ensemble deep neural networks for domain-specific image recognition. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2016)

    Google Scholar 

  15. Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV, pp. 1449–1457 (2015)

    Google Scholar 

  16. Lu, Y.J., Yang, L., Yang, K., Rui, Y.: Mining latent attributes from click-through logs for image recognition. IEEE Trans. Multimed. 17(8), 1213–1224 (2015)

    Article  Google Scholar 

  17. Ou, X., Wei, Z., Ling, H., Liu, S., Cao, X.: Deep multi-context network for fine-grained visual recognition. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2016)

    Google Scholar 

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint: arXiv:1409.1556

  19. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)

    Google Scholar 

  20. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: CVPR, pp. 1701–1708 (2014)

    Google Scholar 

  21. Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: CVPR, pp. 1386–1393 (2014)

    Google Scholar 

  22. Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: CVPR, pp. 2691–2699 (2015)

    Google Scholar 

  23. Xie, G., Yang, K., Bai, Y., Shang, M., Rui, Y., Lai, J.: Improve dog recognition by mining more information from both click-through logs and pre-trained models. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2016)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the National Program on Key Basic Research Projects (973 Program) under Grant 2015CB351803, by the Natural Science Foundation of China (NSFC) under Grants 61331017 and 61390512, and by the Fundamental Research Funds for the Central Universities under Grants WK2100060011 and WK3490000001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Feng, W., Liu, D. (2017). Fine-Grained Image Recognition from Click-Through Logs Using Deep Siamese Network. In: Amsaleg, L., Guðmundsson, G., Gurrin, C., Jónsson, B., Satoh, S. (eds) MultiMedia Modeling. MMM 2017. Lecture Notes in Computer Science(), vol 10132. Springer, Cham. https://doi.org/10.1007/978-3-319-51811-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-51811-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51810-7

  • Online ISBN: 978-3-319-51811-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics