Extracting Knowledge from Web Search Engine Using Wikipedia

Kanavos, Andreas; Makris, Christos; Plegas, Yannis; Theodoridis, Evangelos

doi:10.1007/978-3-642-41016-1_11

Extracting Knowledge from Web Search Engine Using Wikipedia

Andreas Kanavos⁴,
Christos Makris⁴,
Yannis Plegas⁴ &
…
Evangelos Theodoridis⁴

Conference paper

1658 Accesses
2 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 384))

Abstract

Nowadays, search engines are definitely a dominating web tool for finding information on the web. However, web search engines usually return web page references in a global ranking making it difficult to the users to browse different topics captured in the result set. Recently, there are meta-search engine systems that discover knowledge in these web search results providing the user with the possibility to browse different topics contained in the result set. In this paper, we focus on the problem of determining different thematic groups on web search engine results that existing web search engines provide. We propose a novel system that exploits semantic entities of Wikipedia for grouping the result set in different topic groups, according to the various meanings of the provided query. The proposed method utilizes a number of semantic annotation techniques using Knowledge Bases, like WordNet and Wikipedia, in order to perceive the different senses of each query term. Finally, the method annotates the extracted topics using information derived from clusters which in following are presented to the end user.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval, 2nd edn. Addison Wesley (1999, 2011), http://mir2ed.org/
Caputo, A., Basile, P., Semeraro, G.: SENSE: SEmantic N-levels Search Engine at CLEF2008 Ad Hoc Robust-WSD Track. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 126–133. Springer, Heidelberg (2009)
Chapter Google Scholar
Carmel, D., Roitman, H., Zwerdling, N.: Enhancing cluster labeling using wikipedia. In: SIGIR 2009, pp. 139–146 (2009)
Google Scholar
Carpineto, C., Osiski, S., Romano, G., Weiss, D.: A survey of Web clustering engines. ACM Comput. Surv. (2009)
Google Scholar
comScore. Baidu Ranked Third Largest Worldwide Search Property (2008), http://www.comscore.com/press/release.asp?press=2018
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: SIGIR 1992, pp. 318–329 (1992)
Google Scholar
Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall PTR, Upper Saddle River (2002)
Google Scholar
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Oxford University Press (2001)
Google Scholar
Ferragina, P., Gullì, A.: The Anatomy of SnakeT: A Hierarchical Clustering Engine for Web-Page Snippets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 506–508. Springer, Heidelberg (2004)
Chapter Google Scholar
Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In: CIKM 2010, pp. 1625–1628 (2010)
Google Scholar
Giannotti, F., Nanni, M., Pedreschi, D., Samaritani, F.: WebCat: Automatic Categorization of Web Search Results. In: SEBD 2003, pp. 507–518 (2003)
Google Scholar
Hearst, M.A.: Search User Interfaces, 1st edn. Cambridge University Press (2009)
Google Scholar
Hemayati, R., Meng, W., Yu, C.: Semantic-Based Grouping of Search Engine Results Using WordNet. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM 2007. LNCS, vol. 4505, pp. 678–686. Springer, Heidelberg (2007)
Chapter Google Scholar
Hoffart, J., Suchanek, F., Berberich, K., Lewis-Kelham, E., Melo, G., Weikum, G.: YAGO2: exploring and querying world knowledge in time, space, context, and many languages. In: WWW (Companion Volume) 2011, pp. 229–232 (2011)
Google Scholar
Huang, J., Efthimiadis, E.N.: Analyzing and evaluating query reformulation strategies in web search logs. In: CIKM 2009, pp. 77–86 (2009)
Google Scholar
Jansen, B.J., Spink, A., Blakely, C., Koshman, S.: Defining a session on Web search engines. JASIST 58(6), 862–871 (2007)
Article Google Scholar
Jansen, B.J., Spink, A., Pedersen, J.: A temporal comparison of AltaVista Web searching. JASIST 56(6), 559–570 (2005)
Article Google Scholar
Kanavos, A., Theodoridis, E., Tsakalidis, A.: Extracting Knowledge from Web Search Engine Results. In: ICTAI 2012, pp. 860–867 (2012)
Google Scholar
Maarek, Y.S., Fagin, R., Ben-Shaul, I.Z., Pelleg, D.: Ephemeral Document Clustering for Web Applications. Tech. rep. RJ 10186, IBM Research (2000)
Google Scholar
Makris, C., Plegas, Y., Theodoridis, E.: Improved text annotation with Wikipedia entities. In: SAC 2013, pp. 288–295 (2013)
Google Scholar
Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: CIKM 2007, pp. 233–242 (2007)
Google Scholar
Osinski, S., Stefanowski, J., Weiss, D.: Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition. In: Intelligent Information Systems 2004, pp. 359–368 (2004)
Google Scholar
Scaiella, U., Ferragina, P., Marino, A., Ciaramita, M.: Topical clustering of search results. In: WSDM 2012, pp. 223–232 (2012)
Google Scholar
Stein, B., Eissen, S.M.Z.: Topic Identification: Framework and Application. In: I-KNOW 2004, pp. 353–360 (2004)
Google Scholar
Trillo, R., Po, L., Ilarri, S., Bergamaschi, S., Mena, E.: Using semantic techniques to access web data. Inf. Syst. 36(2), 117–133 (2011)
Article Google Scholar
Zamir, O., Etzioni, O.: Grouper: A Dynamic Clustering Interface to Web Search Results. Computer Networks 31(11-16), 1361–1374 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering and Informatics Department, University of Patras, Greece, 26500
Andreas Kanavos, Christos Makris, Yannis Plegas & Evangelos Theodoridis

Authors

Andreas Kanavos
View author publications
You can also search for this author in PubMed Google Scholar
Christos Makris
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Plegas
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos Theodoridis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Forestry & Management of the Environment and Natural Resources, Democritus University of Thrace, GR-68200, Orestiada, Hellas
Lazaros Iliadis
Frederick University of Cyprus, Cyprus
Harris Papadopoulos
Faculty of Engineering and Computing, Coventry University, UK
Chrisina Jayne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kanavos, A., Makris, C., Plegas, Y., Theodoridis, E. (2013). Extracting Knowledge from Web Search Engine Using Wikipedia. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds) Engineering Applications of Neural Networks. EANN 2013. Communications in Computer and Information Science, vol 384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41016-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-41016-1_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41015-4
Online ISBN: 978-3-642-41016-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics