Skip to main content

Extracting Knowledge from Web Search Engine Using Wikipedia

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 384))

Abstract

Nowadays, search engines are definitely a dominating web tool for finding information on the web. However, web search engines usually return web page references in a global ranking making it difficult to the users to browse different topics captured in the result set. Recently, there are meta-search engine systems that discover knowledge in these web search results providing the user with the possibility to browse different topics contained in the result set. In this paper, we focus on the problem of determining different thematic groups on web search engine results that existing web search engines provide. We propose a novel system that exploits semantic entities of Wikipedia for grouping the result set in different topic groups, according to the various meanings of the provided query. The proposed method utilizes a number of semantic annotation techniques using Knowledge Bases, like WordNet and Wikipedia, in order to perceive the different senses of each query term. Finally, the method annotates the extracted topics using information derived from clusters which in following are presented to the end user.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval, 2nd edn. Addison Wesley (1999, 2011), http://mir2ed.org/

  2. Caputo, A., Basile, P., Semeraro, G.: SENSE: SEmantic N-levels Search Engine at CLEF2008 Ad Hoc Robust-WSD Track. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 126–133. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  3. Carmel, D., Roitman, H., Zwerdling, N.: Enhancing cluster labeling using wikipedia. In: SIGIR 2009, pp. 139–146 (2009)

    Google Scholar 

  4. Carpineto, C., Osiski, S., Romano, G., Weiss, D.: A survey of Web clustering engines. ACM Comput. Surv. (2009)

    Google Scholar 

  5. comScore. Baidu Ranked Third Largest Worldwide Search Property (2008), http://www.comscore.com/press/release.asp?press=2018

  6. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: SIGIR 1992, pp. 318–329 (1992)

    Google Scholar 

  7. Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall PTR, Upper Saddle River (2002)

    Google Scholar 

  8. Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Oxford University Press (2001)

    Google Scholar 

  9. Ferragina, P., Gullì, A.: The Anatomy of SnakeT: A Hierarchical Clustering Engine for Web-Page Snippets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 506–508. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In: CIKM 2010, pp. 1625–1628 (2010)

    Google Scholar 

  11. Giannotti, F., Nanni, M., Pedreschi, D., Samaritani, F.: WebCat: Automatic Categorization of Web Search Results. In: SEBD 2003, pp. 507–518 (2003)

    Google Scholar 

  12. Hearst, M.A.: Search User Interfaces, 1st edn. Cambridge University Press (2009)

    Google Scholar 

  13. Hemayati, R., Meng, W., Yu, C.: Semantic-Based Grouping of Search Engine Results Using WordNet. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM 2007. LNCS, vol. 4505, pp. 678–686. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  14. Hoffart, J., Suchanek, F., Berberich, K., Lewis-Kelham, E., Melo, G., Weikum, G.: YAGO2: exploring and querying world knowledge in time, space, context, and many languages. In: WWW (Companion Volume) 2011, pp. 229–232 (2011)

    Google Scholar 

  15. Huang, J., Efthimiadis, E.N.: Analyzing and evaluating query reformulation strategies in web search logs. In: CIKM 2009, pp. 77–86 (2009)

    Google Scholar 

  16. Jansen, B.J., Spink, A., Blakely, C., Koshman, S.: Defining a session on Web search engines. JASIST 58(6), 862–871 (2007)

    Article  Google Scholar 

  17. Jansen, B.J., Spink, A., Pedersen, J.: A temporal comparison of AltaVista Web searching. JASIST 56(6), 559–570 (2005)

    Article  Google Scholar 

  18. Kanavos, A., Theodoridis, E., Tsakalidis, A.: Extracting Knowledge from Web Search Engine Results. In: ICTAI 2012, pp. 860–867 (2012)

    Google Scholar 

  19. Maarek, Y.S., Fagin, R., Ben-Shaul, I.Z., Pelleg, D.: Ephemeral Document Clustering for Web Applications. Tech. rep. RJ 10186, IBM Research (2000)

    Google Scholar 

  20. Makris, C., Plegas, Y., Theodoridis, E.: Improved text annotation with Wikipedia entities. In: SAC 2013, pp. 288–295 (2013)

    Google Scholar 

  21. Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: CIKM 2007, pp. 233–242 (2007)

    Google Scholar 

  22. Osinski, S., Stefanowski, J., Weiss, D.: Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition. In: Intelligent Information Systems 2004, pp. 359–368 (2004)

    Google Scholar 

  23. Scaiella, U., Ferragina, P., Marino, A., Ciaramita, M.: Topical clustering of search results. In: WSDM 2012, pp. 223–232 (2012)

    Google Scholar 

  24. Stein, B., Eissen, S.M.Z.: Topic Identification: Framework and Application. In: I-KNOW 2004, pp. 353–360 (2004)

    Google Scholar 

  25. Trillo, R., Po, L., Ilarri, S., Bergamaschi, S., Mena, E.: Using semantic techniques to access web data. Inf. Syst. 36(2), 117–133 (2011)

    Article  Google Scholar 

  26. Zamir, O., Etzioni, O.: Grouper: A Dynamic Clustering Interface to Web Search Results. Computer Networks 31(11-16), 1361–1374 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kanavos, A., Makris, C., Plegas, Y., Theodoridis, E. (2013). Extracting Knowledge from Web Search Engine Using Wikipedia. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds) Engineering Applications of Neural Networks. EANN 2013. Communications in Computer and Information Science, vol 384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41016-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41016-1_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41015-4

  • Online ISBN: 978-3-642-41016-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics