Employing query disambiguation using clustering techniques

  • Andreas KanavosEmail author
  • Panagiota Kotoula
  • Christos Makris
  • Lazaros Iliadis
Original Paper


Due to the boundless expansion of the Web in the last decade, the research community has paid significant attention to the problem of effective searching in the vast information available. In this paper, we introduce a novel framework for improving information retrieval results. Initially, relevant documents are organized in clusters utilizing several metrics combined with language modelling tools. In following, a produced ranked list of the documents is returned to the user for a specific query. This is implemented as the scores between the clusters and the query representations are extracted; next in line, the internal rankings of the documents, per cluster, using these scores as weighting factor, are combined. Our proposed methodology is based on the exploitation of the inter-documents similarities (lexical and/or semantics) after a sophisticated pre-processing step. Our experimental evaluation demonstrates that the proposed algorithm can efficiently improve the quality of the retrieved results.


Query disambiguation Information retrieval Query reformulation Clustering Containment Semantics 



  1. Agrawal R, Gollapudi S, Halverson A, Ieong S (2009) Diversifying search results. In: 2nd International conference on web search and web data mining (WSDM), pp 5–14Google Scholar
  2. Angel A, Koudas N (2011) Efficient diversity-aware search. In: ACM SIGMOD international conference on management of data (SIGMOD), pp 781–792Google Scholar
  3. Angelov P, Kasabov N (2005) Evolving computational intelligence systems. In: Proceedings of the 1st international workshop on genetic fuzzy systems, pp 76–82Google Scholar
  4. Baeza-Yates RA, Ribeiro-Neto BA (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. Pearson Education Ltd., HarlowGoogle Scholar
  5. Baruah RD, Angelov PP (2012) Evolving local means method for clustering of streaming data. In: IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–8Google Scholar
  6. Baruah RD, Angelov PP (2014) DEC: dynamically evolving clustering and its application to structure identification of evolving fuzzy models. IEEE Trans Cybern 44(9):1619–1631CrossRefGoogle Scholar
  7. Broder AZ, Glassman SC, Manasse MS, Zweig G (1997) Syntactic clustering of the web. Comput Netw 29(8–13):1157–1166Google Scholar
  8. Christen P (2006) A comparison of personal name matching: techniques and practical issues. In: Workshops proceedings of the 6th IEEE international conference on data mining (ICDM), pp 290–294Google Scholar
  9. Clarke CLA, Craswell N, Soboroff I (2009) Overview of the TREC 2009 web track. In: 18th Text REtrieval Conference (TREC)Google Scholar
  10. Clarke CLA, Craswell N, Soboroff I, Cormack GV (2010) Overview of the TREC 2010 web track. In: 19th Text REtrieval Conference (TREC)Google Scholar
  11. Clarke CLA, Craswell N, Soboroff I, Voorhees EM (2011) Overview of the TREC 2011 web track. In: 20th Text REtrieval Conference (TREC)Google Scholar
  12. Clarke CLA, Craswell N, Voorhees EM (2012) Overview of the TREC 2012 web track. In: 21th Text REtrieval Conference (TREC)Google Scholar
  13. Croft WB, Metzler D, Strohman T (2009) Search engines: information retrieval in practice. Pearson Education, LondonGoogle Scholar
  14. Fellbaum C (1998) WordNet: an electronic lexical database. The MIT Press, CambridgeCrossRefzbMATHGoogle Scholar
  15. Ferragina P, Scaiella U (2010) TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities). In: 19th ACM conference on information and knowledge management (CIKM), pp 1625–1628Google Scholar
  16. Giakoumi I, Makris C, Plegas Y (2015) Language model and clustering based information retrieval. In: 11th International conference on web information systems and technologies (WEBIST), pp 479–486Google Scholar
  17. Jardine N, van Rijsbergen CJ (1971) The use of hierarchic clustering in information retrieval. Inf Storage Retr 7(5):217–240CrossRefGoogle Scholar
  18. Järvelin K, Kekäläinen J (2000) IR evaluation methods for retrieving highly relevant documents. In: 23rd Annual international ACM conference on research and development in information retrieval (SIGIR), pp 41–48Google Scholar
  19. Kanavos A, Theodoridis E, Tsakalidis AK (2012) Extracting knowledge from web search engine results. In: IEEE 24th international conference on tools with artificial intelligence (ICTAI), pp 860–867Google Scholar
  20. Kanavos A, Makris C, Plegas Y, Theodoridis E (2013) Extracting knowledge from web search engine using Wikipedia. In: 14th International conference on engineering applications of neural networks (EANN), pp 100–109Google Scholar
  21. Kanavos A, Makris C, Plegas Y, Theodoridis E (2016) Ranking web search results exploiting wikipedia. Int J Artif Intell Tools (IJAIT) 25(3):1–26Google Scholar
  22. Kondrak G (2005) N-gram similarity and distance. In: 12th International conference on string processing and information retrieval (SPIRE), pp 115–126Google Scholar
  23. Kotoula P, Makris C (2018) Query disambiguation based on clustering techniques. In: 14th International conference on artificial intelligence applications and innovations (AIAI), pp 133–145Google Scholar
  24. Kozorovitzky AK, Kurland O (2011) Cluster-based fusion of retrieved lists. In: 34th International ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 893–902Google Scholar
  25. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86MathSciNetCrossRefzbMATHGoogle Scholar
  26. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10:707–710MathSciNetGoogle Scholar
  27. Levi O, Raiber F, Kurland O, Guy I (2016) Selective cluster-based document retrieval. In: 25th ACM international conference on information and knowledge management (CIKM), pp 1473–1482Google Scholar
  28. Makris C, Plegas Y, Theodoridis E (2013) Improved text annotation with Wikipedia entities. In: 28th annual ACM symposium on applied computing (SAC), pp 288–295Google Scholar
  29. Makris C, Plegas Y, Stamatiou YC, Stavropoulos EC, Tsakalidis AK (2014) Reducing redundant information in search results employing approximation algorithms. In: 25th International conference on database and expert systems applications (DEXA), pp 240–247Google Scholar
  30. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  31. Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88CrossRefGoogle Scholar
  32. Navigli R, Ponzetto SP (2010) Babelnet: Building a very large multilingual semantic network. In: 48th Annual meeting of the association for computational linguistics (ACL), pp 216–225Google Scholar
  33. Plegas Y, Stamou S (2013) Reducing information redundancy in search results. In: 28th annual ACM symposium on applied computing (SAC), pp 886–893Google Scholar
  34. Raiber F, Kurland O (2014) The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval. In: 37th International ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 1155–1158Google Scholar
  35. Raviv H, Kurland O, Carmel D (2016) Document retrieval using entity-based language models. In: 39th International ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 65–74Google Scholar
  36. van Rijsbergen CJ (1979) Information retrieval. Butterworth, OxfordzbMATHGoogle Scholar
  37. Suchanek FM, Kasneci G, Weikum G (2007) Yago: A core of semantic knowledge. In: 16th International conference on world wide web (WWW), pp 697–706Google Scholar
  38. Wu Z, Palmer MS (1994) Verb semantics and lexical selection. In: 32nd Annual meeting of the association for computational linguistics (ACL), pp 133–138Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Andreas Kanavos
    • 1
    Email author
  • Panagiota Kotoula
    • 1
  • Christos Makris
    • 1
  • Lazaros Iliadis
    • 2
  1. 1.Computer Engineering and Informatics DepartmentUniversity of PatrasPatrasGreece
  2. 2.Department of Civil EngineeringDemocritus University of ThraceXanthiGreece

Personalised recommendations