Abstract
Due to the boundless expansion of the Web in the last decade, the research community has paid significant attention to the problem of effective searching in the vast information available. In this paper, we introduce a novel framework for improving information retrieval results. Initially, relevant documents are organized in clusters utilizing several metrics combined with language modelling tools. In following, a produced ranked list of the documents is returned to the user for a specific query. This is implemented as the scores between the clusters and the query representations are extracted; next in line, the internal rankings of the documents, per cluster, using these scores as weighting factor, are combined. Our proposed methodology is based on the exploitation of the inter-documents similarities (lexical and/or semantics) after a sophisticated pre-processing step. Our experimental evaluation demonstrates that the proposed algorithm can efficiently improve the quality of the retrieved results.
Similar content being viewed by others
Notes
Google: https://www.google.com/search/about/.
A snippet is usually a short text summarizing the context in which the query words appear in the result page.
References
Agrawal R, Gollapudi S, Halverson A, Ieong S (2009) Diversifying search results. In: 2nd International conference on web search and web data mining (WSDM), pp 5–14
Angel A, Koudas N (2011) Efficient diversity-aware search. In: ACM SIGMOD international conference on management of data (SIGMOD), pp 781–792
Angelov P, Kasabov N (2005) Evolving computational intelligence systems. In: Proceedings of the 1st international workshop on genetic fuzzy systems, pp 76–82
Baeza-Yates RA, Ribeiro-Neto BA (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. Pearson Education Ltd., Harlow
Baruah RD, Angelov PP (2012) Evolving local means method for clustering of streaming data. In: IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–8
Baruah RD, Angelov PP (2014) DEC: dynamically evolving clustering and its application to structure identification of evolving fuzzy models. IEEE Trans Cybern 44(9):1619–1631
Broder AZ, Glassman SC, Manasse MS, Zweig G (1997) Syntactic clustering of the web. Comput Netw 29(8–13):1157–1166
Christen P (2006) A comparison of personal name matching: techniques and practical issues. In: Workshops proceedings of the 6th IEEE international conference on data mining (ICDM), pp 290–294
Clarke CLA, Craswell N, Soboroff I (2009) Overview of the TREC 2009 web track. In: 18th Text REtrieval Conference (TREC)
Clarke CLA, Craswell N, Soboroff I, Cormack GV (2010) Overview of the TREC 2010 web track. In: 19th Text REtrieval Conference (TREC)
Clarke CLA, Craswell N, Soboroff I, Voorhees EM (2011) Overview of the TREC 2011 web track. In: 20th Text REtrieval Conference (TREC)
Clarke CLA, Craswell N, Voorhees EM (2012) Overview of the TREC 2012 web track. In: 21th Text REtrieval Conference (TREC)
Croft WB, Metzler D, Strohman T (2009) Search engines: information retrieval in practice. Pearson Education, London
Fellbaum C (1998) WordNet: an electronic lexical database. The MIT Press, Cambridge
Ferragina P, Scaiella U (2010) TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities). In: 19th ACM conference on information and knowledge management (CIKM), pp 1625–1628
Giakoumi I, Makris C, Plegas Y (2015) Language model and clustering based information retrieval. In: 11th International conference on web information systems and technologies (WEBIST), pp 479–486
Jardine N, van Rijsbergen CJ (1971) The use of hierarchic clustering in information retrieval. Inf Storage Retr 7(5):217–240
Järvelin K, Kekäläinen J (2000) IR evaluation methods for retrieving highly relevant documents. In: 23rd Annual international ACM conference on research and development in information retrieval (SIGIR), pp 41–48
Kanavos A, Theodoridis E, Tsakalidis AK (2012) Extracting knowledge from web search engine results. In: IEEE 24th international conference on tools with artificial intelligence (ICTAI), pp 860–867
Kanavos A, Makris C, Plegas Y, Theodoridis E (2013) Extracting knowledge from web search engine using Wikipedia. In: 14th International conference on engineering applications of neural networks (EANN), pp 100–109
Kanavos A, Makris C, Plegas Y, Theodoridis E (2016) Ranking web search results exploiting wikipedia. Int J Artif Intell Tools (IJAIT) 25(3):1–26
Kondrak G (2005) N-gram similarity and distance. In: 12th International conference on string processing and information retrieval (SPIRE), pp 115–126
Kotoula P, Makris C (2018) Query disambiguation based on clustering techniques. In: 14th International conference on artificial intelligence applications and innovations (AIAI), pp 133–145
Kozorovitzky AK, Kurland O (2011) Cluster-based fusion of retrieved lists. In: 34th International ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 893–902
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10:707–710
Levi O, Raiber F, Kurland O, Guy I (2016) Selective cluster-based document retrieval. In: 25th ACM international conference on information and knowledge management (CIKM), pp 1473–1482
Makris C, Plegas Y, Theodoridis E (2013) Improved text annotation with Wikipedia entities. In: 28th annual ACM symposium on applied computing (SAC), pp 288–295
Makris C, Plegas Y, Stamatiou YC, Stavropoulos EC, Tsakalidis AK (2014) Reducing redundant information in search results employing approximation algorithms. In: 25th International conference on database and expert systems applications (DEXA), pp 240–247
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88
Navigli R, Ponzetto SP (2010) Babelnet: Building a very large multilingual semantic network. In: 48th Annual meeting of the association for computational linguistics (ACL), pp 216–225
Plegas Y, Stamou S (2013) Reducing information redundancy in search results. In: 28th annual ACM symposium on applied computing (SAC), pp 886–893
Raiber F, Kurland O (2014) The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval. In: 37th International ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 1155–1158
Raviv H, Kurland O, Carmel D (2016) Document retrieval using entity-based language models. In: 39th International ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 65–74
van Rijsbergen CJ (1979) Information retrieval. Butterworth, Oxford
Suchanek FM, Kasneci G, Weikum G (2007) Yago: A core of semantic knowledge. In: 16th International conference on world wide web (WWW), pp 697–706
Wu Z, Palmer MS (1994) Verb semantics and lexical selection. In: 32nd Annual meeting of the association for computational linguistics (ACL), pp 133–138
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kanavos, A., Kotoula, P., Makris, C. et al. Employing query disambiguation using clustering techniques. Evolving Systems 11, 305–315 (2020). https://doi.org/10.1007/s12530-019-09292-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-019-09292-7