Image Annotation Incorporating Low-Rankness, Tag and Visual Correlation and Inhomogeneous Errors

Hou, Yuqing

doi:10.1007/978-3-319-27857-5_7

Yuqing Hou²⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9474))

Included in the following conference series:

International Symposium on Visual Computing

2861 Accesses
3 Citations

Abstract

Tag-based image retrieval (TBIR) has drawn much attention in recent years due to the explosive amount of digital images and crowdsourcing tags. However, TBIR is still suffering from the incomplete and inaccurate tags provided by users, posing a great challenge for tag-based image management applications. In this work, we propose a novel method for image annotation, incorporating several priors: Low-Rankness, Tag and Visual Correlation and Inhomogeneous Errors. Highly representative CNN feature vectors are adopted to model the tag-visual correlation and narrow the semantic gap. And we extract word vectors for tags to measure similarity between tags in the semantic level, which is more accurate than traditional frequency-based or graph-based methods. We utilize the Accelerated Proximal Gradient (APG) method to solve our model efficiently. Extensive experiments conducted on multiple benchmark datasets demonstrate the effectiveness and robustness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ntalianis, K., Tsapatsoulis, N., Doulamis, A., Matsatsinis, N.: Automatic annotation of image databases based on implicit crowdsourcing, visual concept modeling and evolution. Multimedia Tools Appl. 69, 397–421 (2014)
Article Google Scholar
Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 29, 394–410 (2007)
Article Google Scholar
Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans. Pattern Anal. Mach. Intell. 25, 1075–1088 (2003)
Article Google Scholar
Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Chapter Google Scholar
Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: ICCV (2009)
Google Scholar
Makadia, A., Pavlovic, V., Kumar, S.: A new baseline for image annotation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 316–329. Springer, Heidelberg (2008)
Chapter Google Scholar
Li, X., Snoek, C.G., Worring, M.: Learning social tag relevance by neighbor voting. IEEE Trans. Multimedia 11, 1310–1322 (2009)
Article Google Scholar
Zhu, G., Yan, S., Ma, Y.: Image tag refinement towards low-rank, content-tag prior and error sparsity. In: ACM MM (2010)
Google Scholar
Goldberg, A., Recht, B., Xu, J., Nowak, R., Zhu, X.: Transduction with matrix completion: three birds with one stone. In: NIPS (2010)
Google Scholar
Wu, L., Jin, R., Jain, A.K.: Tag completion for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35, 716–727 (2013)
Article Google Scholar
Feng, Z., Feng, S., Jin, R., Jain, A.K.: Image tag completion by noisy matrix recovery. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 424–438. Springer, Heidelberg (2014)
Google Scholar
Feng, Z., Jin, R., Jain, A.: Large-scale image annotation by efficient and robust kernel metric learning. In: ICCV (2013)
Google Scholar
Niu, Z., Hua, G., Gao, X., Tian, Q.: Semi-supervised relational topic model for weakly annotated image recognition in social media. In: CVPR (2014)
Google Scholar
Zhao, R., Grosky, W.I.: Narrowing the semantic gap-improved text-based web document retrieval using visual features. IEEE Trans. Multimedia 4, 189–200 (2002)
Article Google Scholar
Jin, Y., Khan, L., Wang, L., Awad, M.: Image annotations by combining multiple evidence & wordnet. In: ACM MM (2005)
Google Scholar
Cilibrasi, R.L., Vitanyi, P.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19, 370–383 (2007)
Article Google Scholar
Wu, L., Hua, X.S., Yu, N., Ma, W.Y., Li, S.: Flickr distance. In: ACM MM (2008)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1958–1970 (2008)
Article Google Scholar
Zhang, H., Berg, A.C., Maire, M., Malik, J.: SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: CVPR (2006)
Google Scholar
Huiskes, M.J., Lew, M.S.: The MIR Flickr retrieval evaluation. In: MIR 2008: Proceedings of the 2008 ACM ICMI (2008)
Google Scholar
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58, 11 (2011)
Article MathSciNet Google Scholar
Chung, F.R.: Spectral Graph Theory. American Mathematical Society, Providence (1997)
MATH Google Scholar
Gammerman, A., Vovk, V., Vapnik, V.: Learning by transduction. In: UAI (1998)
Google Scholar
Toh, K.C., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pac. J. Optimiz. 6, 615–640 (2010)
MATH MathSciNet Google Scholar
Cai, J.F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)
Article MATH MathSciNet Google Scholar
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531 (2013)
Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: ACM SIGIR (2003)
Google Scholar
Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: CVPR (2004)
Google Scholar
Sigurbjörnsson, B., Van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: ACM WWW (2008)
Google Scholar
Lee, S., De Neve, W., Plataniotis, K.N., Ro, Y.M.: Map-based image tag recommendation using a visual folksonomy. Pattern Recogn. Lett. 31, 976–982 (2010)
Article Google Scholar
Chen, M., Zheng, A., Weinberger, K.: Fast image tagging. In: ICML (2013)
Google Scholar
Metzler, D., Manmatha, R.: An inference network approach to image retrieval. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 42–50. Springer, Heidelberg (2004)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Machine Perception (MOE), School of EECS, Peking University, Beijing, 100871, China
Yuqing Hou

Authors

Yuqing Hou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuqing Hou .

Editor information

Editors and Affiliations

University of Nevada, Reno, Nevada, USA
George Bebis
NASA Ames Research Center, Moffett Field, California, USA
Richard Boyle
Lawrence Berkeley National Laboratory, Berkeley, California, USA
Bahram Parvin
Desert Research Institute, Reno, Nevada, USA
Darko Koracin
University of Houston, Houston, Texas, USA
Ioannis Pavlidis
IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Rogerio Feris
Purdue University, West Lafayette, Indiana, USA
Tim McGraw
Side Effects Software, Santa Monica, California, USA
Mark Elendt
The DiVE, Durham, North Carolina, USA
Regis Kopper
Texas A&M University, College Station, Texas, USA
Eric Ragan
Kent State University, Kent, Ohio, USA
Zhao Ye
Lawrence Berkeley National Laboratory, Berkeley, California, USA
Gunther Weber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hou, Y. (2015). Image Annotation Incorporating Low-Rankness, Tag and Visual Correlation and Inhomogeneous Errors. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2015. Lecture Notes in Computer Science(), vol 9474. Springer, Cham. https://doi.org/10.1007/978-3-319-27857-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-27857-5_7
Published: 18 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27856-8
Online ISBN: 978-3-319-27857-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics