Probabilistic Reuse of Past Search Results

Gutiérrez-Soto, Claudio; Hubert, Gilles

doi:10.1007/978-3-319-10073-9_21

Claudio Gutiérrez-Soto^20,21 &
Gilles Hubert²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8644))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1171 Accesses
3 Citations

Abstract

In this paper, a new Monte Carlo algorithm to improve precision of information retrieval by using past search results is presented. Experiments were carried out to compare the proposed algorithm with traditional retrieval on a simulated dataset. In this dataset, documents, queries, and judgments of users were simulated. Exponential and Zipf distributions were used to build document collections. Uniform distribution was applied to build the queries. Zeta distribution was utilized to simulate the Bradford’s law representing the judgments of users. Empirical results show a better performance of our algorithm compared with traditional retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bigot, A., Chrisment, C., Dkaki, T., Hubert, G., Mothe, J.: Fusing different information retrieval systems according to query-topics: a study based on correlation in information retrieval systems and trec topics. Inf. Retr. 14(6), 617–648 (2011)
Article Google Scholar
Gray, P., Watson, H.J.: Present and future directions in data warehousing. SIGMIS Database 29(3), 83–90 (1998)
Article Google Scholar
Nopiah, Z.M., Khairir, M.I., Abdullah, S., Baharin, M.N., Arifin, A.: Time complexity analysis of the genetic algorithm clustering method. In: Proceedings of the 9th WSEAS International Conference on Signal Processing, Robotics and Automation, ISPRA 2010, Stevens Point, Wisconsin, USA, pp. 171–176. World Scientific and Engineering Academy and Society, WSEAS (2010)
Google Scholar
Kearns, M.J.: The Computational Complexity of Machine Learning. PhD thesis, Harvard University, USA, Cambridge, MA, USA (1989)
Google Scholar
Cetintas, S., Si, L., Yuan, H.: Using past queries for resource selection in distributed information retrieval. Technical Report 1743, Department of Computer Science, Purdue University (2011)
Google Scholar
Shen, X., Zhai, C.X.: Exploiting query history for document ranking in interactive information retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR 2003, pp. 377–378. ACM, New York (2003)
Chapter Google Scholar
Shen, X., Tan, B., Zhai, C.: Context-sensitive information retrieval using implicit feedback. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 43–50. ACM, New York (2005)
Google Scholar
Fonseca, B.M., Golgher, P.B., de Moura, E.S., Ziviani, N.: Using association rules to discover search engines related queries. In: Proceedings of the First Conference on Latin American Web Congress, LA-WEB 2003, pp. 66–71. IEEE Computer Society, Washington, DC (2003)
Chapter Google Scholar
Baeza-Yates, R., Hurtado, C., Mendoza, M.: Query recommendation using query logs in search engines. In: Lindner, W., Fischer, F., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 588–596. Springer, Heidelberg (2004)
Chapter Google Scholar
Teevan, J., Adar, E., Jones, R., Potts, M.A.S.: Information re-retrieval: repeat queries in yahoo’s logs. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 151–158. ACM, New York (2007)
Google Scholar
Garcia, S.: Search Engine Optimisation Using Past Queries. PhD thesis, RMIT University, Australia (2007)
Google Scholar
Clough, P., Sanderson, M.: Evaluating the performance of information retrieval systems using test collections. Information Research 18(2) (2013)
Google Scholar
Huurnink, B., Hofmann, K., de Rijke, M., Bron, M.: Validating query simulators: An experiment using commercial searches and purchases. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds.) CLEF 2010. LNCS, vol. 6360, pp. 40–51. Springer, Heidelberg (2010)
Chapter Google Scholar
Joachims, T.: A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, pp. 143–151. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Google Scholar
Chan, E.P., Garcia, S., Roukos, S.: Probabilistic modeling for information retrieval with unsupervised training data. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD), pp. 159–163. AAAI Press (1998)
Google Scholar
Salton, G., Buckley, C.: Readings in information retrieval. In: Sparck Jones, K., Willett, P. (eds.) Readings in Information Retrieval, pp. 355–364. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Google Scholar
Radwan, A.A.A., Latef, B.A.A., Ali, A.M.A., Sadek, O.A.: Using genetic algorithm to improve information retrieval systems. World Academy of Science, Engineering and Technology 17, 1021–1027 (2008)
Google Scholar
Lillis, D., Toolan, F., Mur, A., Peng, L., Collier, R., Dunnion, J.: Probability-based fusion of information retrieval result sets. Artif. Intell. Rev. 25(1-2), 179–191 (2006)
Article Google Scholar
Gutiérrez-Soto, C., Hubert, G.: Evaluating the interest of revamping past search results. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds.) DEXA 2013, Part II. LNCS, vol. 8056, pp. 73–80. Springer, Heidelberg (2013)
Chapter Google Scholar
Poosala, V.: Zipf’s law. Technical Report 900 839 0750, Bell Laboratories (1997)
Google Scholar
Garfield, E.: Bradford’s Law and Related Statistical Patterns. Essays of an Information Scientist 4(19), 476–483 (1980)
Google Scholar

Download references

Author information

Authors and Affiliations

IRIT UMR 5505 CNRS, Université de Toulouse, 118 route de Narbonne, F-31062, Toulouse cedex 9, India
Claudio Gutiérrez-Soto & Gilles Hubert
Departamento de Sistemas de Información, Universidad del Bío-Bío, Chile
Claudio Gutiérrez-Soto

Authors

Claudio Gutiérrez-Soto
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Hubert
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Tecnológico de Informática, 46022, Valencia, Spain
Hendrik Decker
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, 166 27, Prague 6, Czech Republic
Lenka Lhotská
Department of Computer Science, The University of Auckland, 1010, Auckland, New Zealand
Sebastian Link
Knowledge Management, LMU University of Munich, Leopoldstraße 13, 80802, Munich, Germany
Marcus Spies
University of Linz, FAW, Altenbergerstrasse 69,, 4040, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gutiérrez-Soto, C., Hubert, G. (2014). Probabilistic Reuse of Past Search Results. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8644. Springer, Cham. https://doi.org/10.1007/978-3-319-10073-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-10073-9_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10072-2
Online ISBN: 978-3-319-10073-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics