Employing query disambiguation using clustering techniques

Kanavos, Andreas; Kotoula, Panagiota; Makris, Christos; Iliadis, Lazaros

doi:10.1007/s12530-019-09292-7

Employing query disambiguation using clustering techniques

Original Paper
Published: 11 July 2019

Volume 11, pages 305–315, (2020)
Cite this article

Evolving Systems Aims and scope Submit manuscript

Andreas Kanavos¹,
Panagiota Kotoula¹,
Christos Makris¹ &
…
Lazaros Iliadis²

107 Accesses
4 Citations
Explore all metrics

Abstract

Due to the boundless expansion of the Web in the last decade, the research community has paid significant attention to the problem of effective searching in the vast information available. In this paper, we introduce a novel framework for improving information retrieval results. Initially, relevant documents are organized in clusters utilizing several metrics combined with language modelling tools. In following, a produced ranked list of the documents is returned to the user for a specific query. This is implemented as the scores between the clusters and the query representations are extracted; next in line, the internal rankings of the documents, per cluster, using these scores as weighting factor, are combined. Our proposed methodology is based on the exploitation of the inter-documents similarities (lexical and/or semantics) after a sophisticated pre-processing step. Our experimental evaluation demonstrates that the proposed algorithm can efficiently improve the quality of the retrieved results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive and analytical review of text clustering techniques

Article 08 April 2024

Vivek Mehta, Mohit Agarwal & Rohit Kumar Kaliyar

Short text similarity measurement methods: a review

Article 03 January 2021

Dimas Wibisono Prakoso, Asad Abdi & Chintan Amrit

A systematic review on page ranking algorithms

Article 22 February 2020

Prem Sagar Sharma, Divakar Yadav & Pankaj Garg

Notes

Google: https://www.google.com/search/about/.
A snippet is usually a short text summarizing the context in which the query words appear in the result page.
http://www.nltk.org/howto/wordnet.html.
http://opennlp.sourceforge.net/models-1.5/.
http://sourceforge.net/projects/jWordNet/.
https://wordnet.princeton.edu/.
http://lemurproject.org/clueweb09/.

References

Agrawal R, Gollapudi S, Halverson A, Ieong S (2009) Diversifying search results. In: 2nd International conference on web search and web data mining (WSDM), pp 5–14
Angel A, Koudas N (2011) Efficient diversity-aware search. In: ACM SIGMOD international conference on management of data (SIGMOD), pp 781–792
Angelov P, Kasabov N (2005) Evolving computational intelligence systems. In: Proceedings of the 1st international workshop on genetic fuzzy systems, pp 76–82
Baeza-Yates RA, Ribeiro-Neto BA (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. Pearson Education Ltd., Harlow
Google Scholar
Baruah RD, Angelov PP (2012) Evolving local means method for clustering of streaming data. In: IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–8
Baruah RD, Angelov PP (2014) DEC: dynamically evolving clustering and its application to structure identification of evolving fuzzy models. IEEE Trans Cybern 44(9):1619–1631
Article Google Scholar
Broder AZ, Glassman SC, Manasse MS, Zweig G (1997) Syntactic clustering of the web. Comput Netw 29(8–13):1157–1166
Google Scholar
Christen P (2006) A comparison of personal name matching: techniques and practical issues. In: Workshops proceedings of the 6th IEEE international conference on data mining (ICDM), pp 290–294
Clarke CLA, Craswell N, Soboroff I (2009) Overview of the TREC 2009 web track. In: 18th Text REtrieval Conference (TREC)
Clarke CLA, Craswell N, Soboroff I, Cormack GV (2010) Overview of the TREC 2010 web track. In: 19th Text REtrieval Conference (TREC)
Clarke CLA, Craswell N, Soboroff I, Voorhees EM (2011) Overview of the TREC 2011 web track. In: 20th Text REtrieval Conference (TREC)
Clarke CLA, Craswell N, Voorhees EM (2012) Overview of the TREC 2012 web track. In: 21th Text REtrieval Conference (TREC)
Croft WB, Metzler D, Strohman T (2009) Search engines: information retrieval in practice. Pearson Education, London
Google Scholar
Fellbaum C (1998) WordNet: an electronic lexical database. The MIT Press, Cambridge
Book Google Scholar
Ferragina P, Scaiella U (2010) TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities). In: 19th ACM conference on information and knowledge management (CIKM), pp 1625–1628
Giakoumi I, Makris C, Plegas Y (2015) Language model and clustering based information retrieval. In: 11th International conference on web information systems and technologies (WEBIST), pp 479–486
Jardine N, van Rijsbergen CJ (1971) The use of hierarchic clustering in information retrieval. Inf Storage Retr 7(5):217–240
Article Google Scholar
Järvelin K, Kekäläinen J (2000) IR evaluation methods for retrieving highly relevant documents. In: 23rd Annual international ACM conference on research and development in information retrieval (SIGIR), pp 41–48
Kanavos A, Theodoridis E, Tsakalidis AK (2012) Extracting knowledge from web search engine results. In: IEEE 24th international conference on tools with artificial intelligence (ICTAI), pp 860–867
Kanavos A, Makris C, Plegas Y, Theodoridis E (2013) Extracting knowledge from web search engine using Wikipedia. In: 14th International conference on engineering applications of neural networks (EANN), pp 100–109
Chapter Google Scholar
Kanavos A, Makris C, Plegas Y, Theodoridis E (2016) Ranking web search results exploiting wikipedia. Int J Artif Intell Tools (IJAIT) 25(3):1–26
Google Scholar
Kondrak G (2005) N-gram similarity and distance. In: 12th International conference on string processing and information retrieval (SPIRE), pp 115–126
Chapter Google Scholar
Kotoula P, Makris C (2018) Query disambiguation based on clustering techniques. In: 14th International conference on artificial intelligence applications and innovations (AIAI), pp 133–145
Google Scholar
Kozorovitzky AK, Kurland O (2011) Cluster-based fusion of retrieved lists. In: 34th International ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 893–902
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MathSciNet Google Scholar
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10:707–710
MathSciNet Google Scholar
Levi O, Raiber F, Kurland O, Guy I (2016) Selective cluster-based document retrieval. In: 25th ACM international conference on information and knowledge management (CIKM), pp 1473–1482
Makris C, Plegas Y, Theodoridis E (2013) Improved text annotation with Wikipedia entities. In: 28th annual ACM symposium on applied computing (SAC), pp 288–295
Makris C, Plegas Y, Stamatiou YC, Stavropoulos EC, Tsakalidis AK (2014) Reducing redundant information in search results employing approximation algorithms. In: 25th International conference on database and expert systems applications (DEXA), pp 240–247
Google Scholar
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Book Google Scholar
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88
Article Google Scholar
Navigli R, Ponzetto SP (2010) Babelnet: Building a very large multilingual semantic network. In: 48th Annual meeting of the association for computational linguistics (ACL), pp 216–225
Plegas Y, Stamou S (2013) Reducing information redundancy in search results. In: 28th annual ACM symposium on applied computing (SAC), pp 886–893
Raiber F, Kurland O (2014) The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval. In: 37th International ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 1155–1158
Raviv H, Kurland O, Carmel D (2016) Document retrieval using entity-based language models. In: 39th International ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 65–74
van Rijsbergen CJ (1979) Information retrieval. Butterworth, Oxford
MATH Google Scholar
Suchanek FM, Kasneci G, Weikum G (2007) Yago: A core of semantic knowledge. In: 16th International conference on world wide web (WWW), pp 697–706
Wu Z, Palmer MS (1994) Verb semantics and lexical selection. In: 32nd Annual meeting of the association for computational linguistics (ACL), pp 133–138

Download references

Author information

Authors and Affiliations

Computer Engineering and Informatics Department, University of Patras, 26504, Patras, Greece
Andreas Kanavos, Panagiota Kotoula & Christos Makris
Department of Civil Engineering, Democritus University of Thrace, 67100, Xanthi, Greece
Lazaros Iliadis

Authors

Andreas Kanavos
View author publications
You can also search for this author in PubMed Google Scholar
Panagiota Kotoula
View author publications
You can also search for this author in PubMed Google Scholar
Christos Makris
View author publications
You can also search for this author in PubMed Google Scholar
Lazaros Iliadis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Kanavos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kanavos, A., Kotoula, P., Makris, C. et al. Employing query disambiguation using clustering techniques. Evolving Systems 11, 305–315 (2020). https://doi.org/10.1007/s12530-019-09292-7

Download citation

Received: 27 November 2018
Accepted: 03 July 2019
Published: 11 July 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s12530-019-09292-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Employing query disambiguation using clustering techniques

Abstract

Access this article

Similar content being viewed by others

A comprehensive and analytical review of text clustering techniques

Short text similarity measurement methods: a review

A systematic review on page ranking algorithms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Employing query disambiguation using clustering techniques

Abstract

Access this article

Similar content being viewed by others

A comprehensive and analytical review of text clustering techniques

Short text similarity measurement methods: a review

A systematic review on page ranking algorithms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation