Skip to main content

A Novel Approach for Extracting Pertinent Keywords for Web Image Annotation Using Semantic Distance and Euclidean Distance

  • Conference paper
  • First Online:
Software Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 731))

Abstract

The World Wide Web today comprises of billions of Web documents with information on varied topics presented by different types of media such as text, images, audio, and video. Therefore along with textual information, the number of images over WWW is exponentially growing. As compared to text, the annotation of images by its semantics is more complicated as there is a lack of correlation between user’s semantics and computer system’s low-level features. Moreover, the Web pages are generally composed of contents containing multiple topics and the context relevant to the image on the Web page makes only a small portion of the full text, leading to the challenge for image search engines to annotate and index Web images. Existing image annotation systems use contextual information from page title, image src tag, alt tag, meta tag, image surrounding text for annotating Web image. Nowadays, some intelligent approaches perform a page segmentation as a preprocessing step. This paper proposes a novel approach for annotating Web images. In this work, Web pages are divided into Web content blocks based on the visual structure of page and thereafter the textual data of Web content blocks which are semantically closer to the blocks containing Web images are extracted. The relevant keywords from textual information along with contextual information of images are used for annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sumathi, T., Devasena, C.L., Hemalatha, M.: An overview of automated image annotation approaches. Int. J. Res. Rev. Inf. Sci. 1(1) (2011) (Copyright © Science Academy Publisher, United Kingdom)

    Google Scholar 

  2. Swain, M., Frankel, C., Athitsos, V.: Webseer: an image search engine for the World Wide Web. In: CVPR (1997)

    Google Scholar 

  3. Smith, J., Chang, S.: An image and video search engine for the world-wide web. Storage. Retr. Im. Vid. Datab. 8495 (1997)

    Google Scholar 

  4. Ortega-Binderberger, M., Mehrotra, V., Chakrabarti, K., Porkaew, K.: Webmars: a multimedia search engine. In: SPIE An. Symposium on Electronic Imaging, San Jose, California. Academy Publisher, United Kingdom (2000)

    Google Scholar 

  5. Alexandre, L., Pereira, M., Madeira, S., Cordeiro, J., Dias, G.: Web image indexing: combining image analysis with text processing. In: Proceedings of the 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS04). Publisher, United Kingdom (2004)

    Google Scholar 

  6. Yadav, M., Gulati, P.: A novel approach for extracting relevant keywords for web image annotation using semantics. In: 9th International Conference on ASEICT (2015)

    Google Scholar 

  7. Coelho, T.A.S., Calado, P.P., Souza, L.V., Ribeiro-Neto, B., Muntz, R.: Image retrieval using multiple evidence ranking. IEEE Trans. Knowl. Data Eng. 16(4), 408–417 (2004)

    Article  Google Scholar 

  8. Pan, L.: Image 8: an image search engine for the internet. Honours Year Project Report, School of Computing, National University of Singapore, April, 2003

    Google Scholar 

  9. Liu, B.: Web data mining: exploring hyperlinks, contents, and usage data. Data-Centric Syst. Appl. Springer 2007 16(4), 408–417 (2004)

    Google Scholar 

  10. Fauzi, F., Hong, J., Belkhatir, M.: Webpage segmentation for extracting images and their surrounding contextual information. In: ACM Multimedia, pp. 649–652 (2009)

    Google Scholar 

  11. Chakrabarti, D., Kumar, R., Punera, K.: A graphtheoretic approach to webpage segmentation. In: Proceeding of the 17th International Conference on World Wide Web, WWW’08, pp. 377–386, New York, USA (2008)

    Google Scholar 

  12. Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: VIPS: a vision based page segmentation algorithm. Technical Report, Microsoft Research (MSR-TR-2003-79) (2003)

    Google Scholar 

  13. Hattori, G., Hoashi, K., Matsumoto, K., Sugaya, F.: Robust web page segmentation for mobile terminal using content distances and page layout information. In: Proceedings of the 16th International Conference on World Wide Web, WWW’07, pp. 361–370, New York, NY, USA. ACM (2007)

    Google Scholar 

  14. Nguyen, H.A., Eng, B.: New semantic similarity techniques of concepts applied in the Biomedical domain and wordnet. Master thesis, The University of Houston-Clear Lake (2006)

    Google Scholar 

  15. Voorhees, E.: Using WordNet to disambiguate word senses for text retrieval. In: Proceedings of the 16th Annual International ACM SIGIR Conference (1993)

    Google Scholar 

  16. Landauer, T.K., Foltz, P., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25 (1998)

    Article  Google Scholar 

  17. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: WordNet: An on-line lexical database. Int. J. Lexicogr. 3, 235–244 (1990)

    Article  Google Scholar 

  18. Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of the 4th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing’03, pp. 241–257. Springer, Berlin, Heidelberg (2003)

    Chapter  Google Scholar 

  19. Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 2, ACL-36, pp. 768–774, Morristown, NJ, USA. Association for Computational Linguistics (1998); Sparck Jones, K.: A Statistical Interpretation of Term Specificity and Its Application in Retrieval, pp. 132–142. Taylor Graham Publishing, London, UK (1988)

    Google Scholar 

  20. Corley, C., Mihalcea, R.: Measuring the semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, EMSEE’05, pp. 13–18, Morristown, NJ, USA, 2005. Association for Computational Linguistics (1998)

    Google Scholar 

  21. Tryfou, G., Tsapatsoulis, N.: Image Indexing Based on Web Page Segmentation and Clustering (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manisha Yadav .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gulati, P., Yadav, M. (2019). A Novel Approach for Extracting Pertinent Keywords for Web Image Annotation Using Semantic Distance and Euclidean Distance. In: Hoda, M., Chauhan, N., Quadri, S., Srivastava, P. (eds) Software Engineering. Advances in Intelligent Systems and Computing, vol 731. Springer, Singapore. https://doi.org/10.1007/978-981-10-8848-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8848-3_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8847-6

  • Online ISBN: 978-981-10-8848-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics