Abstract
Image recognition using deep network models has achieved remarkable progress in recent years. However, fine-grained recognition remains a big challenge due to the lack of large-scale well labeled dataset to train the network. In this paper, we study a deep network based method for fine-grained image recognition by utilizing the click-through logs from search engines. We use both click times and probability values to filter out the noise in click-through logs. Furthermore, we propose a deep siamese network model to fine-tune the classifier, emphasizing the subtle difference between different classes and tolerating the variation within the same class. Our method is evaluated by training with the Bing clickture-dog dataset and testing with the well labeled dog breed dataset. The results demonstrate great improvement achieved by our method compared with naive training.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bai, Y., Yang, K., Yu, W., Xu, C., Ma, W.Y., Zhao, T.: Automatic image dataset construction from click-through logs using deep neural network. In: ACM Multimedia, pp. 441–450. ACM (2015)
Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. Int. J. Pattern Recogn. Artif. Intell. 7(04), 669–688 (1993)
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVPR, vol. 1, pp. 539–546. IEEE (2005)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255. IEEE (2009)
Dong, J., Li, X., Liao, S., Xu, J., Xu, D., Du, X.: Image retrieval by cross-media relevance fusion. In: ACM Multimedia, pp. 173–176. ACM (2015)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR, vol. 2, pp. 1735–1742. IEEE (2006)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Hua, X.S., Yang, L., Wang, J., Wang, J., Ye, M., Wang, K., Rui, Y., Li, J.: Clickage: towards bridging semantic and intent gaps via mining click logs of search engines. In: ACM Multimedia, pp. 243–252. ACM (2013)
Hua, X.S., Ye, M., Li, J.: Mining knowledge from clicks: MSR-Bing image retrieval challenge. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2014)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NIPS, pp. 2017–2025 (2015)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia, pp. 675–678. ACM (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
Li, C., Song, Q., Wang, Y., Song, H., Kang, Q., Cheng, J., Lu, H.: Learning to recognition from bing clickture data. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2016)
Li, W., Ke, C.: Ensemble deep neural networks for domain-specific image recognition. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2016)
Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV, pp. 1449–1457 (2015)
Lu, Y.J., Yang, L., Yang, K., Rui, Y.: Mining latent attributes from click-through logs for image recognition. IEEE Trans. Multimed. 17(8), 1213–1224 (2015)
Ou, X., Wei, Z., Ling, H., Liu, S., Cao, X.: Deep multi-context network for fine-grained visual recognition. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint: arXiv:1409.1556
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: CVPR, pp. 1701–1708 (2014)
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: CVPR, pp. 1386–1393 (2014)
Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: CVPR, pp. 2691–2699 (2015)
Xie, G., Yang, K., Bai, Y., Shang, M., Rui, Y., Lai, J.: Improve dog recognition by mining more information from both click-through logs and pre-trained models. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2016)
Acknowledgment
This work was supported by the National Program on Key Basic Research Projects (973 Program) under Grant 2015CB351803, by the Natural Science Foundation of China (NSFC) under Grants 61331017 and 61390512, and by the Fundamental Research Funds for the Central Universities under Grants WK2100060011 and WK3490000001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Feng, W., Liu, D. (2017). Fine-Grained Image Recognition from Click-Through Logs Using Deep Siamese Network. In: Amsaleg, L., Guðmundsson, G., Gurrin, C., Jónsson, B., Satoh, S. (eds) MultiMedia Modeling. MMM 2017. Lecture Notes in Computer Science(), vol 10132. Springer, Cham. https://doi.org/10.1007/978-3-319-51811-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-51811-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51810-7
Online ISBN: 978-3-319-51811-4
eBook Packages: Computer ScienceComputer Science (R0)