Abstract
In this paper, a new Monte Carlo algorithm to improve precision of information retrieval by using past search results is presented. Experiments were carried out to compare the proposed algorithm with traditional retrieval on a simulated dataset. In this dataset, documents, queries, and judgments of users were simulated. Exponential and Zipf distributions were used to build document collections. Uniform distribution was applied to build the queries. Zeta distribution was utilized to simulate the Bradford’s law representing the judgments of users. Empirical results show a better performance of our algorithm compared with traditional retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bigot, A., Chrisment, C., Dkaki, T., Hubert, G., Mothe, J.: Fusing different information retrieval systems according to query-topics: a study based on correlation in information retrieval systems and trec topics. Inf. Retr. 14(6), 617–648 (2011)
Gray, P., Watson, H.J.: Present and future directions in data warehousing. SIGMIS Database 29(3), 83–90 (1998)
Nopiah, Z.M., Khairir, M.I., Abdullah, S., Baharin, M.N., Arifin, A.: Time complexity analysis of the genetic algorithm clustering method. In: Proceedings of the 9th WSEAS International Conference on Signal Processing, Robotics and Automation, ISPRA 2010, Stevens Point, Wisconsin, USA, pp. 171–176. World Scientific and Engineering Academy and Society, WSEAS (2010)
Kearns, M.J.: The Computational Complexity of Machine Learning. PhD thesis, Harvard University, USA, Cambridge, MA, USA (1989)
Cetintas, S., Si, L., Yuan, H.: Using past queries for resource selection in distributed information retrieval. Technical Report 1743, Department of Computer Science, Purdue University (2011)
Shen, X., Zhai, C.X.: Exploiting query history for document ranking in interactive information retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR 2003, pp. 377–378. ACM, New York (2003)
Shen, X., Tan, B., Zhai, C.: Context-sensitive information retrieval using implicit feedback. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 43–50. ACM, New York (2005)
Fonseca, B.M., Golgher, P.B., de Moura, E.S., Ziviani, N.: Using association rules to discover search engines related queries. In: Proceedings of the First Conference on Latin American Web Congress, LA-WEB 2003, pp. 66–71. IEEE Computer Society, Washington, DC (2003)
Baeza-Yates, R., Hurtado, C., Mendoza, M.: Query recommendation using query logs in search engines. In: Lindner, W., Fischer, F., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 588–596. Springer, Heidelberg (2004)
Teevan, J., Adar, E., Jones, R., Potts, M.A.S.: Information re-retrieval: repeat queries in yahoo’s logs. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 151–158. ACM, New York (2007)
Garcia, S.: Search Engine Optimisation Using Past Queries. PhD thesis, RMIT University, Australia (2007)
Clough, P., Sanderson, M.: Evaluating the performance of information retrieval systems using test collections. Information Research 18(2) (2013)
Huurnink, B., Hofmann, K., de Rijke, M., Bron, M.: Validating query simulators: An experiment using commercial searches and purchases. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds.) CLEF 2010. LNCS, vol. 6360, pp. 40–51. Springer, Heidelberg (2010)
Joachims, T.: A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, pp. 143–151. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Chan, E.P., Garcia, S., Roukos, S.: Probabilistic modeling for information retrieval with unsupervised training data. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD), pp. 159–163. AAAI Press (1998)
Salton, G., Buckley, C.: Readings in information retrieval. In: Sparck Jones, K., Willett, P. (eds.) Readings in Information Retrieval, pp. 355–364. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Radwan, A.A.A., Latef, B.A.A., Ali, A.M.A., Sadek, O.A.: Using genetic algorithm to improve information retrieval systems. World Academy of Science, Engineering and Technology 17, 1021–1027 (2008)
Lillis, D., Toolan, F., Mur, A., Peng, L., Collier, R., Dunnion, J.: Probability-based fusion of information retrieval result sets. Artif. Intell. Rev. 25(1-2), 179–191 (2006)
Gutiérrez-Soto, C., Hubert, G.: Evaluating the interest of revamping past search results. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds.) DEXA 2013, Part II. LNCS, vol. 8056, pp. 73–80. Springer, Heidelberg (2013)
Poosala, V.: Zipf’s law. Technical Report 900 839 0750, Bell Laboratories (1997)
Garfield, E.: Bradford’s Law and Related Statistical Patterns. Essays of an Information Scientist 4(19), 476–483 (1980)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Gutiérrez-Soto, C., Hubert, G. (2014). Probabilistic Reuse of Past Search Results. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8644. Springer, Cham. https://doi.org/10.1007/978-3-319-10073-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-10073-9_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10072-2
Online ISBN: 978-3-319-10073-9
eBook Packages: Computer ScienceComputer Science (R0)