Journal of Zhejiang University-SCIENCE A

, Volume 10, Issue 12, pp 1759–1768 | Cite as

Image interpretation: mining the visible and syntactic correlation of annotated words

  • Ding-yin Xia
  • Fei Wu
  • Wen-hao Liu
  • Han-wang Zhang


Automatic web image annotation is a practical and effective way for both web image retrieval and image understanding. However, current annotation techniques make no further investigation of the statement-level syntactic correlation among the annotated words, therefore making it very difficult to render natural language interpretation for images such as “pandas eat bamboo”. In this paper, we propose an approach to interpret image semantics through mining the visible and textual information hidden in images. This approach mainly consists of two parts: first the annotated words of target images are ranked according to two factors, namely the visual correlation and the pairwise co-occurrence; then the statement-level syntactic correlation among annotated words is explored and natural language interpretation for the target image is obtained. Experiments conducted on real-world web images show the effectiveness of the proposed approach.

Key words

Web image annotation Visibility Pairwise co-occurrence Natural language interpretation 

CLC number

TP37 TP391 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Blei, D., Ng, A., Jordan, M., 2003. Latent Dirichlet allocation. J. Mach. Learn. Res., 3(4–5):993–1022. [doi:10.1162/jmlr.2003.3.4-5.993]MATHGoogle Scholar
  2. Cilibrasi, R., Vitanyi, P., 2006. Automatic Extraction of Meaning from the Web. Proc. IEEE Int. Symp. on Information Theory, p.2309–2313.Google Scholar
  3. Datta, R., Joshi, D., Li, J., Wang, J.Z., 2008. Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv., 40(2): Article 5, p.1–60. [doi:10.1145/1348246.1348248]CrossRefGoogle Scholar
  4. Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R., 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci., 41(6):391–407. [doi:10.1002/(SICI)1097-4571(199009)41:6〈391::AID-ASI1〉3.0.CO;2-9]CrossRefGoogle Scholar
  5. Deschacht, K., Moens, M., 2007. Text Analysis for Automatic Image Annotation. 45th Annual Meeting Association for Computational Linguistics, p.1000–1007.Google Scholar
  6. Doyle, P.G., Snell, J.L., 1984. Random Walks and Electric Networks. No. 22. Mathematical Association of America, Washington, D.C., USA.MATHGoogle Scholar
  7. Duygulu, P., Barnard, K., de Fretias, N., Forsyth, D., 2002. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. Proc. European Conf. on Computer Vision, p.97–112. [doi:10.1007/3-540-47979-1_7]Google Scholar
  8. Fergus, R., Li, F., Perona, P., Zisserman, A., 2005. Learning Object Categories from Google’s Image Search. Tenth IEEE Int. Conf. on Computer Vision, p.1816–1823. [doi:10.1109/ICCV.2005.142]Google Scholar
  9. Frey, B.J., Dueck, D., 2007. Clustering by passing messages between data points. Science, 315(5814):972–976. [doi:10.1126/science.1136800]MathSciNetCrossRefMATHGoogle Scholar
  10. Jeon, J., Manmatha, R., 2004. Using Maximum Entropy for Automatic Image Annotation. Proc. Int. Conf. on Image and Video Retrieval, p.24–32.Google Scholar
  11. Jeon, J., Lavrenko, V., Manmatha, R., 2003. Automatic Image Annotation and Retrieval Using Cross-media Relevance Models. Proc. ACM SIGIR Conf., p.119–126. [doi:10.1145/860435.860459]Google Scholar
  12. Jin, R., Chai, J.Y., Si, L., 2004. Effective Automatic Image Annotation via a Coherent Language Model and Active Learning. Proc. ACM Multimedia, p.892–899. [doi:10.1145/1027527.1027732]Google Scholar
  13. Li, J., Wang, J.Z., 2006. Real-time Computerized Annotation of Pictures. Proc. ACM Multimedia, p.911–920. [doi:10.1145/1180639.1180841]Google Scholar
  14. Liu, Y., Wu, F., 2009. Multi-modality video shot clustering with tensor representation. Multim. Tools Appl., 41(1):93–109. [doi:10.1007/s11042-008-0220-5]CrossRefGoogle Scholar
  15. Liu, Y., Fu, Y., Zhang, M., Ma, S., Ru, L., 2007. Automatic Search Engine Performance Evaluation with Click-through Data Analysis. Proc. 16th Int. Conf. on World Wide Web Conf., p.1133–1134. [doi:10.1145/1242572.1242731]Google Scholar
  16. Liu, Y., Wu, F., Zhuang, Y., Xiao, J., 2008. Active Post-refined Multi-modality Video Semantic Concept Detection with Tensor Representation. Proc. ACM Multimedia, p.91–100. [doi:10.1145/1459359.1459372]Google Scholar
  17. Metzler, D., Manmatha, R., 2004. An Inference Network Approach to Image Retrieval. Proc Int. Conf. on Image and Video Retrieval, p.42–50.Google Scholar
  18. Miller, G.A., 1995. WordNet: a lexical database for English. Commun. ACM, 38(11):39–41. [doi:10.1145/219717.219748]CrossRefGoogle Scholar
  19. Pedersen, T., Patwardhan, S., Michelizzi, J., 2004. WordNet-Similarity: Measuring the Relatedness of Concepts. Proc. 5th Annual Meeting of the North American Chapter of the Association for Computational Linguistics, p.38–41.Google Scholar
  20. Pehcevski, J., Thom, J.A., 2007. Evaluating Focused Retrieval Tasks. SIGIR Workshop on Focused Retrieval, p.33–40.Google Scholar
  21. Rui, X., Yu, N., Wang, T., Li, M., 2007. A Search-based Web Image Annotation Method. IEEE Int. Conf. on Multimedia and Expo, p.655–658. [doi:10.1109/ICME.2007.4284735]Google Scholar
  22. Wang, J.Z., Geman, D., Luo, J., Gray, R.M., 2008. Real-world image annotation and retrieval: an introduction to the special section. IEEE Trans. Pattern Anal. Mach. Intell., 30(11):1873–1876. [doi:10.1109/TPAMI.2008.231]CrossRefGoogle Scholar
  23. Wu, F., Xia, D., Zhuang, Y., Zhang, H., Liu, W., 2009. Web Image Interpretation: Semi-supervised Mining Annotated Words. IEEE Int. Conf. on Multimedia and Expo, p.1512–1515. [doi:10.1109/ICME.2009.5202791]Google Scholar
  24. Wu, L., Hua, X.S., Yu, N., Ma, W.Y., Li, S., 2008. Flickr Distance. Proc. ACM Multimedia, p.31–40. [doi:10.1145/1459359.1459364]Google Scholar
  25. Xia, D., Wu, F., Zhang, X., Zhuang, Y., 2008a. Local and global approaches of affinity propagation clustering for large scale data. J. Zhejiang Univ. Sci. A, 9(10):1373–1381. [doi:10.1631/jzus.A0720058]CrossRefMATHGoogle Scholar
  26. Xia, D., Wu, F., Zhuang, Y., 2008b. Search-Based Automatic Web Image Annotation Using Latent Visual and Semantic Analysis. Pacific-Rim Conf. on Multimedia, p.842–845. [doi:10.1007/978-3-540-89796-5_95]Google Scholar
  27. Yan, R., Hauptmann, A., Jin, R., 2003. Multimedia Search with Pseudo-relevance Feedback. Proc. Int. Conf. on Image and Video Retrieval, p.238–247.Google Scholar
  28. Yeh, T., Lee, J.J., Darrell, T., 2008. Photo-based Question Answering. Proc. ACM Multimedia, p.389–398. [doi:10.1145/1459359.1459412]Google Scholar
  29. Zhu, X., Goldberg, A.B., van Gael, J., Andrzejewski, D., 2007a. Improving Diversity in Ranking Using Absorbing Random Walks. Proc. 8th Annual Meeting of the North American Chapter of the Association for Computational Linguistics.Google Scholar
  30. Zhu, X., Goldberg, A.B., Eldawy, M., Dyer, C.R., Strock, B., 2007b. A Text-to-picture Synthesis System for Augmenting Communication. Integrated Intelligence Track of the 22nd AAAI Conf. on Artificial Intelligence, p.1590–1595.Google Scholar

Copyright information

© Zhejiang University and Springer Berlin Heidelberg 2009

Authors and Affiliations

  • Ding-yin Xia
    • 1
  • Fei Wu
    • 1
  • Wen-hao Liu
    • 1
  • Han-wang Zhang
    • 1
  1. 1.School of Computer Science and TechnologyZhejiang UniversityHangzhouChina

Personalised recommendations