Abstract
It is becoming increasingly common in information retrieval to combine evidence from multiple resources to compute the retrieval status value of documents. Although this has led to considerable improvements in several retrieval tasks, one of the outstanding issues is estimation of the respective weights that should be associated with the different sources of evidence. In this paper we propose to use maximum entropy in combination with the limited memory LBFG algorithm to estimate feature weights. Examining the effectiveness of our approach with respect to the known-item finding task of enterprise track of TREC shows that it significantly outperforms a standard retrieval baseline and leads to competitive performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berger, A.L., Della Pietra, V.J., Della Pietra, S.A.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)
Cooper, W.S.: Exploiting the maximum entropy principle to increase retrieval effectiveness. Journal of the American Society for Information Science 34(1), 31–39 (1983)
Craswell, N., de Vries, A., Soboroff, I.: Overview of the trec-2005 enterprise track. In: Proceedings of the 14th Text REtrieval Conference (2006)
Greiff, W.R., Ponte, J.M.: The maximum entropy approach and probabilistic ir models. ACM Trans. Inf. Syst. 18(3), 246–287 (2000)
Kantor, P.B., Lee, J.J.: The maximum entropy principle in information retrieval. In: SIGIR 1986: Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 269–274. ACM Press, New York (1986)
Kantor, P.B., Lee, J.J.: Testing the maximum entropy principle for information retrieval. J. Am. Soc. Inf. Sci. 49(6), 557–566 (1998)
Lalmas, M.: Uniform representation of content and structure for structured document retrieval. In: 20th SGES International Conference on Knowledge Based Systems and Applied Artificial Intelligence (2000)
Monz, C.: From Document Retrieval to Question Answering. PhD thesis, University of Amsterdam (2003)
Nallapati, R.: Discriminative models for information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 64–71 (2004)
Nocedal, J.: Updating quasi-newton matrices with limited storage. Mathematics of Computation 35, 773–782 (1980)
Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 143–150. ACM Press, New York (2003)
Ogilvie, P., Callan, J.: Experiments with language models for known-item finding of e-mail messages. In: Proceedings of the Fourteenth Text Retrieval Conference (TREC-14) (2005)
Rasolofo, Y., Savoy, J.: Term proximity scoring for keyword-based retrieval systems. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, p. 79. Springer, Heidelberg (2003)
Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: CIKM 2004: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 42–49. ACM Press, New York (2004)
Tsikrika, T., Lalmas, M.: Combining evidence from web retrieval using the inference network model - an experimental study. Information Processing & Management, Special Issue in Bayesian Networks and Information Retrieval 40(5), 751–772 (2004)
Zobel, J., Moffat, A.: Exploring the similarity space. SIGIR Forum 32(1), 18–34 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yahyaei, S., Monz, C. (2008). Applying Maximum Entropy to Known-Item Email Retrieval. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds) Advances in Information Retrieval. ECIR 2008. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78646-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-78646-7_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78645-0
Online ISBN: 978-3-540-78646-7
eBook Packages: Computer ScienceComputer Science (R0)