A New Baseline for Image Annotation

  • Ameesh Makadia
  • Vladimir Pavlovic
  • Sanjiv Kumar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5304)


Automatically assigning keywords to images is of great interest as it allows one to index, retrieve, and understand large collections of image data. Many techniques have been proposed for image annotation in the last decade that give reasonable performance on standard datasets. However, most of these works fail to compare their methods with simple baseline techniques to justify the need for complex models and subsequent training. In this work, we introduce a new baseline technique for image annotation that treats annotation as a retrieval problem. The proposed technique utilizes low-level image features and a simple combination of basic distances to find nearest neighbors of a given image. The keywords are then assigned using a greedy label transfer mechanism. The proposed baseline outperforms the current state-of-the-art methods on two standard and one large Web dataset. We believe that such a baseline measure will provide a strong platform to compare and better understand future annotation techniques.


Image Retrieval Baseline Method Haar Wavelet Image Annotation Annotation Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Yang, C., Dong, M., Hua, J.: Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (2006)Google Scholar
  2. 2.
    Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2007)Google Scholar
  3. 3.
    Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.A.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: European Conference on Computer Vision, pp. 97–112 (2002)Google Scholar
  4. 4.
    Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proc. ACM SIGIR, pp. 127–134 (2003)Google Scholar
  5. 5.
    Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: ACM SIGIR Conf. Research and Development in Informaion Retrieval, New York, NY, USA, pp. 119–126 (2003)Google Scholar
  6. 6.
    Wang, L., Liu, L., Khan, L.: Automatic image annotation and retrieval using subspace clustering algorithm. In: ACM Int’l Workshop Multimedia Databases (2004)Google Scholar
  7. 7.
    Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: Advances in Neural Information Processing Systems, vol. 16 (2004)Google Scholar
  8. 8.
    Monay, F., Gatica-Perez, D.: On image auto-annotation with latent space models. In: ACM Int’l Conf. Multimedia, pp. 275–278 (2003)Google Scholar
  9. 9.
    Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: IEEE Conf. Computer Vision and Pattern Recognition (2004)Google Scholar
  10. 10.
    Barnard, K., Johnson, M.: Word sense disambiguation with pictures. Artificial Intelligence 167, 13–30 (2005)CrossRefGoogle Scholar
  11. 11.
    Metzler, D., Manmatha, R.: An inference network approach to image retrieval. In: Image and Video Retrieval, pp. 42–50. Springer, Heidelberg (2005)Google Scholar
  12. 12.
    Hare, J.S., Lewisa, P.H., Enserb, P.G.B., Sandomb, C.J.: Mind the gap: Another look at the problem of the semantic gap in image retrieval. Multimedia Content, Analysis, Management and Retrieval (2006)Google Scholar
  13. 13.
    Frome, A., Singer, Y., Sha, F., Malik, J.: Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: Proceedings of the IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil (2007)Google Scholar
  14. 14.
    Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Royal Statistical Soc. B 58, 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys (2008)Google Scholar
  16. 16.
    Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: First International Workshop on Multimedia Intel ligent Storage and Retrieval Management (MISRM) (1999)Google Scholar
  17. 17.
    Jin, R., Chai, J.Y., Si, L.: Effective automatic image annotation via a coherent language model and active learning. In: ACM Multimedia Conference, pp. 892–899 (2004)Google Scholar
  18. 18.
    Gao, Y., Fan, J.: Incorporating concept ontology to enable probabilistic concept reasoning for multi-level image annotation. In: 8th ACM International Workshop on Multimedia Information Retrieval, pp. 79–88 (2006)Google Scholar
  19. 19.
    Li, J., Wang, J.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (2003)Google Scholar
  20. 20.
    von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: ACM CHI (2004)Google Scholar
  21. 21.
    Yavlinsky, A., Schofield, E., Ruger, S.: Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 507–517. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ameesh Makadia
    • 1
  • Vladimir Pavlovic
    • 2
  • Sanjiv Kumar
    • 1
  1. 1.Google ResearchNew York NY
  2. 2.Rutgers UniversityPiscataway

Personalised recommendations