Skip to main content
Log in

Employing query disambiguation using clustering techniques

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Due to the boundless expansion of the Web in the last decade, the research community has paid significant attention to the problem of effective searching in the vast information available. In this paper, we introduce a novel framework for improving information retrieval results. Initially, relevant documents are organized in clusters utilizing several metrics combined with language modelling tools. In following, a produced ranked list of the documents is returned to the user for a specific query. This is implemented as the scores between the clusters and the query representations are extracted; next in line, the internal rankings of the documents, per cluster, using these scores as weighting factor, are combined. Our proposed methodology is based on the exploitation of the inter-documents similarities (lexical and/or semantics) after a sophisticated pre-processing step. Our experimental evaluation demonstrates that the proposed algorithm can efficiently improve the quality of the retrieved results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Google: https://www.google.com/search/about/.

  2. A snippet is usually a short text summarizing the context in which the query words appear in the result page.

  3. http://www.nltk.org/howto/wordnet.html.

  4. http://opennlp.sourceforge.net/models-1.5/.

  5. http://sourceforge.net/projects/jWordNet/.

  6. https://wordnet.princeton.edu/.

  7. http://lemurproject.org/clueweb09/.

References

  • Agrawal R, Gollapudi S, Halverson A, Ieong S (2009) Diversifying search results. In: 2nd International conference on web search and web data mining (WSDM), pp 5–14

  • Angel A, Koudas N (2011) Efficient diversity-aware search. In: ACM SIGMOD international conference on management of data (SIGMOD), pp 781–792

  • Angelov P, Kasabov N (2005) Evolving computational intelligence systems. In: Proceedings of the 1st international workshop on genetic fuzzy systems, pp 76–82

  • Baeza-Yates RA, Ribeiro-Neto BA (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. Pearson Education Ltd., Harlow

    Google Scholar 

  • Baruah RD, Angelov PP (2012) Evolving local means method for clustering of streaming data. In: IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–8

  • Baruah RD, Angelov PP (2014) DEC: dynamically evolving clustering and its application to structure identification of evolving fuzzy models. IEEE Trans Cybern 44(9):1619–1631

    Article  Google Scholar 

  • Broder AZ, Glassman SC, Manasse MS, Zweig G (1997) Syntactic clustering of the web. Comput Netw 29(8–13):1157–1166

    Google Scholar 

  • Christen P (2006) A comparison of personal name matching: techniques and practical issues. In: Workshops proceedings of the 6th IEEE international conference on data mining (ICDM), pp 290–294

  • Clarke CLA, Craswell N, Soboroff I (2009) Overview of the TREC 2009 web track. In: 18th Text REtrieval Conference (TREC)

  • Clarke CLA, Craswell N, Soboroff I, Cormack GV (2010) Overview of the TREC 2010 web track. In: 19th Text REtrieval Conference (TREC)

  • Clarke CLA, Craswell N, Soboroff I, Voorhees EM (2011) Overview of the TREC 2011 web track. In: 20th Text REtrieval Conference (TREC)

  • Clarke CLA, Craswell N, Voorhees EM (2012) Overview of the TREC 2012 web track. In: 21th Text REtrieval Conference (TREC)

  • Croft WB, Metzler D, Strohman T (2009) Search engines: information retrieval in practice. Pearson Education, London

    Google Scholar 

  • Fellbaum C (1998) WordNet: an electronic lexical database. The MIT Press, Cambridge

    Book  Google Scholar 

  • Ferragina P, Scaiella U (2010) TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities). In: 19th ACM conference on information and knowledge management (CIKM), pp 1625–1628

  • Giakoumi I, Makris C, Plegas Y (2015) Language model and clustering based information retrieval. In: 11th International conference on web information systems and technologies (WEBIST), pp 479–486

  • Jardine N, van Rijsbergen CJ (1971) The use of hierarchic clustering in information retrieval. Inf Storage Retr 7(5):217–240

    Article  Google Scholar 

  • Järvelin K, Kekäläinen J (2000) IR evaluation methods for retrieving highly relevant documents. In: 23rd Annual international ACM conference on research and development in information retrieval (SIGIR), pp 41–48

  • Kanavos A, Theodoridis E, Tsakalidis AK (2012) Extracting knowledge from web search engine results. In: IEEE 24th international conference on tools with artificial intelligence (ICTAI), pp 860–867

  • Kanavos A, Makris C, Plegas Y, Theodoridis E (2013) Extracting knowledge from web search engine using Wikipedia. In: 14th International conference on engineering applications of neural networks (EANN), pp 100–109

    Chapter  Google Scholar 

  • Kanavos A, Makris C, Plegas Y, Theodoridis E (2016) Ranking web search results exploiting wikipedia. Int J Artif Intell Tools (IJAIT) 25(3):1–26

    Google Scholar 

  • Kondrak G (2005) N-gram similarity and distance. In: 12th International conference on string processing and information retrieval (SPIRE), pp 115–126

    Chapter  Google Scholar 

  • Kotoula P, Makris C (2018) Query disambiguation based on clustering techniques. In: 14th International conference on artificial intelligence applications and innovations (AIAI), pp 133–145

    Google Scholar 

  • Kozorovitzky AK, Kurland O (2011) Cluster-based fusion of retrieved lists. In: 34th International ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 893–902

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  MathSciNet  Google Scholar 

  • Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10:707–710

    MathSciNet  Google Scholar 

  • Levi O, Raiber F, Kurland O, Guy I (2016) Selective cluster-based document retrieval. In: 25th ACM international conference on information and knowledge management (CIKM), pp 1473–1482

  • Makris C, Plegas Y, Theodoridis E (2013) Improved text annotation with Wikipedia entities. In: 28th annual ACM symposium on applied computing (SAC), pp 288–295

  • Makris C, Plegas Y, Stamatiou YC, Stavropoulos EC, Tsakalidis AK (2014) Reducing redundant information in search results employing approximation algorithms. In: 25th International conference on database and expert systems applications (DEXA), pp 240–247

    Google Scholar 

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88

    Article  Google Scholar 

  • Navigli R, Ponzetto SP (2010) Babelnet: Building a very large multilingual semantic network. In: 48th Annual meeting of the association for computational linguistics (ACL), pp 216–225

  • Plegas Y, Stamou S (2013) Reducing information redundancy in search results. In: 28th annual ACM symposium on applied computing (SAC), pp 886–893

  • Raiber F, Kurland O (2014) The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval. In: 37th International ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 1155–1158

  • Raviv H, Kurland O, Carmel D (2016) Document retrieval using entity-based language models. In: 39th International ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 65–74

  • van Rijsbergen CJ (1979) Information retrieval. Butterworth, Oxford

    MATH  Google Scholar 

  • Suchanek FM, Kasneci G, Weikum G (2007) Yago: A core of semantic knowledge. In: 16th International conference on world wide web (WWW), pp 697–706

  • Wu Z, Palmer MS (1994) Verb semantics and lexical selection. In: 32nd Annual meeting of the association for computational linguistics (ACL), pp 133–138

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Kanavos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kanavos, A., Kotoula, P., Makris, C. et al. Employing query disambiguation using clustering techniques. Evolving Systems 11, 305–315 (2020). https://doi.org/10.1007/s12530-019-09292-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-019-09292-7

Keywords

Navigation