A survey on automatic image annotation

Chen, Yilu; Zeng, Xiaojun; Chen, Xing; Guo, Wenzhong

doi:10.1007/s10489-020-01696-2

A survey on automatic image annotation

Published: 09 June 2020

Volume 50, pages 3412–3428, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yilu Chen¹,
Xiaojun Zeng¹,
Xing Chen¹ &
…
Wenzhong Guo¹

1422 Accesses
17 Citations
Explore all metrics

Abstract

Automatic image annotation is a crucial area in computer vision, which plays a significant role in image retrieval, image description, and so on. Along with the internet technique developing, there are numerous images posted on the web, resulting in the fact that it is a challenge to annotate images only by humans. Hence, many computer vision researchers are interested in automatic image annotation and make a great effort in optimizing its performance. Automatic image annotation is a task that assigns several tags in a limited vocabulary to describe an image. There are many algorithms proposed to tackle this problem and all achieve great performance. In this paper, we review seven algorithms for automatic image annotation and evaluate these algorithms leveraging different image features, such as color histograms and Gist descriptor. Our goal is to provide insights into the automatic image annotation. A lot of comprehensive experiments, which are based on Corel5K, IAPR TC-12, and ESP Game datasets, are designed to compare the performance of these algorithms. We also compare the performance of traditional algorithms employing deep learning features. Considering that not all associated labels are annotated by human annotators, we leverage the DIA metrics on IAPR TC-12 and ESP Game datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Deep learning models for digital image processing: a review

Article 07 January 2024

Notes

References

Adeniyi DA, Wei Z, Yongquan Y (2016) Automated web usage data mining and recommendation system using k-nearest neighbor (knn) classification method. Appl Comput Inform 12(1):S221083271400026X
Article Google Scholar
Akata Z, Reed S, Walter D, Lee H, Schiele B (2015) Evaluation of output embeddings for fine-grained image classification. In: Proceedings of the conference on computer vision and pattern recognition. IEEE, pp 2927–2936
Bannour H, Hudelot C (2014) Building and using fuzzy multimedia ontologies for semantic image annotation. Multimed Tools Appl 72(3):2107–2141
Article Google Scholar
Bertsimas D, Nohadani O (2019) Robust maximum likelihood estimation. INFORMS J Comput 31 (3):445–458
Article MathSciNet Google Scholar
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J, et al. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends®; Mach Learn 3(1):1–122
MATH Google Scholar
Brinker K, Hüllermeier E (2007) Case-based multilabel ranking. In: Proceedings of the international joint conference on artificial intelligence
Castellano G, Fanelli AM, Sforza G, Torsello MA (2016) Shape annotation for intelligent image retrieval. Appl Intell 44(1):179–195
Article Google Scholar
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of British machine vision conference
Chen M, Xu Z, Weinberger K, Sha F (2012) Marginalized denoising autoencoders for domain adaptation. In: Proceedings of the international conference on machine learning
Chen M, Zheng A, Weinberger K (2013) Fast image tagging. In: Proceedings of the international conference on machine learning, pp 1274–1282
Chen X, Gupta A (2015) Webly supervised learning of convolutional networks. In: Proceedings of the international conference on computer vision. IEEE, pp 1431–1439
Cox DR, Isham V (2018) Point processes. Routledge
Divvala SK, Farhadi A, Guestrin C (2014) Learning everything about anything: webly-supervised visual concept learning. In: Proceedings of the conference on computer vision and pattern recognition. IEEE, pp 3270–3277
Duygulu P, Barnard K, de Freitas JF, Forsyth DA (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of the European conference on computer vision. Springer, pp 97–112
Feng Z, Feng S, Jin R, Jain AK (2014) Image tag completion by noisy matrix recovery. In: Proceedings of the European conference on computer vision. Springer, pp 424–438
Frank A, Fabregat-Traver D, Bientinesi P (2016) Large-scale linear regression: development of high-performance routines. Appl Math Comput 275:411–421
MathSciNet MATH Google Scholar
Gong C, Tao D, Liu W, Liu L, Yang J (2017) Label propagation via teaching-to-learn and learning-to-teach. IEEE Trans Neural Netw Learn Syst 28(6):1452–1465
Article Google Scholar
Gong C, Tao D, Yang J, Liu W (2016) Teaching-to-learn and learning-to-teach for multi-label propagation. In: Proceedings of association for the advancement of artificial intelligence, pp 1610–1616
Grubinger M, Clough P, Müller H, Deselaers T (2006) The iapr tc-12 benchmark: a new evaluation resource for visual information systems. In: Proceedings of int. workshop OntoImage, vol 5
Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of the international conference on computer vision. IEEE, pp 309–316
Guo H, Zheng K, Fan X, Yu H, Wang S (2019) Visual attention consistency under image transforms for multi-label image classification. In: Proceedings of the conference on computer vision and pattern recognition, pp 729–739
Haque R, Penkale S, Way A (2018) Termfinder: log-likelihood comparison and phrase-based statistical machine translation models for bilingual terminology extraction. Lang Resour Eval 52(2):365–400
Article Google Scholar
Hsu DJ, Kakade SM, Langford J, Zhang T (2009) Multi-label prediction via compressed sensing. In: Proceedings of the conference on neural information processing systems, pp 772–780
Jiang X, Zeng W, So H, Zoubir AM, Kirubarajan T (2016) Beamforming via nonconvex linear regression. IEEE Trans Signal Process 64(7):1714–1728
Article MathSciNet Google Scholar
Kalayeh MM, Idrees H, Shah M (2014) Nmf-knn: image annotation using weighted multi-view non-negative matrix factorization. In: Proceedings of the conference on computer vision and pattern recognition, pp 184–191
Kapoor A, Viswanathan R, Jain P (2012) Multilabel classification using Bayesian compressed sensing. In: Advances in neural information processing systems, pp 2645–2653
Ke X, Li S, Chen G (2013) Real web community based automatic image annotation. Comput Electr Eng 39(3):945–956
Article Google Scholar
Ke X, Zou J, Niu Y (2019) End-to-end automatic image annotation based on deep cnn and multi-label data augmentation. IEEE Transactions on Multimedia
Keller JM, Gray MR, Givens JA (2012) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern SMC-15(4):580–585
Article Google Scholar
Knerr B, Holzer M, Angerer C, Rupp M (2010) Slot-wise maximum likelihood estimation of the tag population size in FSA protocols. IEEE Trans Commun 58(2):578–585
Article Google Scholar
Ko V, Hjort NL (2019) Model robust inference with two-stage maximum likelihood estimation for copulas. J Multivar Anal 171:362–381
Article MathSciNet Google Scholar
Kulesza A, Taskar B, et al. (2012) Determinantal point processes for machine learning. Found Trends®; Mach Learn 5(2–3):123–286
Article Google Scholar
Li Y, Yang H (2014) Efficiency of a stochastic restricted two-parameter estimator in linear regression. Appl Math Comput 249:371–381
MathSciNet MATH Google Scholar
Liu W, He J, Chang SF (2010) Large graph construction for scalable semi-supervised learning. In: Proceedings of the international conference on machine learning, pp 679–686
Liu Y, Ma Z, Fang Y (2017) Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowl-Based Syst 133:S095070511730326X
Google Scholar
Luo F, Guo W, Yu Y, Chen G (2017) A multi-label classification algorithm based on kernel extreme learning machine. Neurocomputing 260:313–320
Article Google Scholar
Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of the European conference on computer vision. Springer, pp 316–329
Moran S, Lavrenko V (2014) A sparse kernel relevance model for automatic image annotation. Int J Multimed Inform Retriev 3(4):209–229
Article Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Article Google Scholar
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the conference on empirical methods in natural language processing, pp 1532–1543
Rousu J, Saunders C, Szedmák S, Shawe-Taylor J (2006) Kernel-based learning of hierarchical multilabel classification models. J Mach Learn Res 7:1601–1626
MathSciNet MATH Google Scholar
Sim S, Bae H, Choi Y (2019) Likelihood-based multiple imputation by event chain methodology for repair of imperfect event logs with missing data. In: Proceedings of the international conference on process mining, pp 9–16
Spyromitros E, Tsoumakas G, Vlahavas I (2008) An empirical study of lazy multilabel classification algorithms. In: Proceedings of conference on artificial intelligence: theories, models and applications
Tousch A, Herbin S, Audibert J (2012) Semantic hierarchies for image annotation: a survey. Pattern Recogn 45(1):333–345
Article Google Scholar
Verma Y, Jawahar C (2012) Image annotation using metric learning in semantic neighbourhoods. In: Proceedings of the European conference on computer vision. Springer, pp 836–849
Von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 319–326
Wang C, Yan S, Zhang L, Zhang H (2009) Multi-label sparse coding for automatic image annotation. In: Proceedings of the conference on computer vision and pattern recognition
Wang J, Yang Y, Mao J, Huang Z, Huang C, Xu W (2016) Cnn-rnn: a unified framework for multi-label image classification. In: Proceedings of the conference on computer vision and pattern recognition, pp 2285–2294
Wang Z, Gong G (2018) Discrete fourier transform of boolean functions over the complex field and its applications. IEEE Trans Inf Theory 64(4):3000–3009
Article MathSciNet Google Scholar
Wu B, Jia F, Liu W, Ghanem B (2017) Diverse image annotation. In: Proceedings of the conference on computer vision and pattern recognition. IEEE
Wu B, Lyu S, Ghanem B (2015) Ml-mg: multi-label learning with missing labels using a mixed graph. In: Proceedings of the international conference on computer vision. IEEE, pp 4157–4165
Xuan J, Lu J, Zhang G, Xu RYD, Luo X (2017) A bayesian nonparametric model for multi-label learning. Mach Learn 106(11):1787–1815
Article MathSciNet Google Scholar
Yu G, Zhu H, Domeniconi C (2015) Predicting protein functions using incomplete hierarchical labels. BMC Bioinform 16:1:1– 1:12
Article Google Scholar
Yu Y, Sun Z (2017) Sparse coding extreme learning machine for classification. Neurocomputing 261:50–56
Article Google Scholar
Zhang C, Jing L, Qi T, Xu C (2014) Image classification by non-negative sparse coding, low-rank and sparse decomposition. In: Proceedings of the conference on computer vision and pattern recognition
Zhang H, Wu W, Wang D (2018) Multi-instance multi-label learning of natural scene images: via sparse coding and multi-layer neural network. IET Comput Vis 12(3):305–311
Article Google Scholar
Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recogn 40 (7):2038–2048
Article Google Scholar
Zhang S, Huang J, Huang Y, Yu Y, Li H, Metaxas DN (2010) Automatic image annotation using group sparsity. In: Proceedings of the conference on computer vision and pattern recognition. IEEE, pp 3312–3319
Zhang T, Ghanem B, Liu S, Ahuja N (2012) Low-rank sparse learning for robust visual tracking. In: Proceedings of the European conference on computer vision. Springer, pp 470– 484
Zhang T, Ghanem B, Liu S, Xu C, Ahuja N (2013) Low-rank sparse coding for image classification. In: Proceedings of the international conference on computer vision, pp 281–288
Zhang T, Ghanem B, Liu S, Xu C, Ahuja N (2014) Low-rank sparse coding for image classification. In: Proceedings of the international conference on computer vision
Zhang T, Liu S, Ahuja N, Yang MH, Ghanem B (2015) Robust visual tracking via consistent low-rank sparse learning. Int J Comput Vis 111(2):171–190
Article Google Scholar
Zhang X, Li W, Nguyen V, Zhuang F, Xiong H, Lu S (2018) Label-sensitive task grouping by Bayesian nonparametric approach for multi-task multi-label learning. In: Proceedings of the international joint conference on artificial intelligence Sweden, pp 3125–3131
Zhe X, Ou-Yang L, Chen S, Yan H (2019) Semantic hierarchy preserving deep hashing for large-scale image retrieval. arXiv:https://arxiv.org/abs/1901.11259
Zhong S, Chen T, He F, Niu Y (2014) Fast gaussian kernel learning for classification tasks based on specially structured global optimization. Neural Netw 57:51–62
Article Google Scholar
Zhu G, Yan S, Ma Y (2010) Image tag refinement towards low-rank, content-tag prior and error sparsity. In: Proceedings of the international conference on multimedia. ACM, pp 461–470

Download references

Acknowledgements

This work was supported in part by Guiding Project of Fujian Province under Grant No. 2018H0017 and the Talent Program of Fujian Province for Distinguished Young Scholars in Higher Education.

Author information

Authors and Affiliations

The College of Mathematics and Computer Science, Fuzhou University, Fujian, China
Yilu Chen, Xiaojun Zeng, Xing Chen & Wenzhong Guo

Authors

Yilu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhong Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xing Chen or Wenzhong Guo.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Zeng, X., Chen, X. et al. A survey on automatic image annotation. Appl Intell 50, 3412–3428 (2020). https://doi.org/10.1007/s10489-020-01696-2

Download citation

Published: 09 June 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s10489-020-01696-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on automatic image annotation

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

Attention mechanisms in computer vision: A survey

Deep learning models for digital image processing: a review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey on automatic image annotation

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

Attention mechanisms in computer vision: A survey

Deep learning models for digital image processing: a review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation