Information Retrieval

, Volume 14, Issue 1, pp 26–46 | Cite as

Modeling score distributions in information retrieval

  • Avi Arampatzis
  • Stephen Robertson
The Second International Conference on the Theory of Information Retrieval (ICTIR2009)


We review the history of modeling score distributions, focusing on the mixture of normal-exponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the Recall-Fallout Convexity Hypothesis, and formulate two new hypotheses considering the component distributions, individually as well as in pairs, under some limiting conditions of parameter values. From all the mixtures suggested in the past, the current theoretical argument points to the two gamma as the most-likely universal model, with the normal-exponential being a usable approximation. Beyond the theoretical contribution, we provide new experimental evidence showing vector space or geometric models, and BM25, as being ‘friendly’ to the normal-exponential, and that the non-convexity problem that the mixture possesses is practically not severe. Furthermore, we review recent non-binary mixture models, speculate on graded relevance, and consider methods such as logistic regression for score calibration.


Score distribution Normalization Distributed retrieval Fusion Filtering 



We thank Jaap Kamps (Archives and Information Science, Media Studies, University of Amsterdam, the Netherlands) for his contribution during earlier stages of this work.


  1. Arampatzis, A. (2001). Unbiased s-d threshold optimization, initial query degradation, decay, and incrementality, for adaptive document filtering. In Proceedings TREC 2001, NIST.Google Scholar
  2. Arampatzis, A., & van Hameren, A. (2001). The score-distributional threshold optimization for adaptive binary classification tasks. In Proceedings SIGIR’01 (pp. 285–293). ACM Press.Google Scholar
  3. Arampatzis, A., & Kamps, J. (2008). Where to stop reading a ranked list? In: Proceedings TREC 2008, NIST.Google Scholar
  4. Arampatzis, A., & Kamps, J. (2009). A signal-to-noise approach to score normalization. In Proceedings CIKM (pp. 797–806). ACM Press.Google Scholar
  5. Arampatzis, A., Beney, J., Koster, C. H. A., & van der Weide, T. P. (2000). Incrementality, half-life, and threshold optimization for adaptive document filtering. In Proceedings TREC 2000, NIST.Google Scholar
  6. Arampatzis, A., Kamps, J., & Robertson, S. (2009). Where to stop reading a ranked list? Threshold optimization using truncated score distributions. In: Proceedings SIGIR’09 (pp. 524–531). ACM Press.Google Scholar
  7. Baumgarten, C. (1999). A probabilitstic solution to the selection and fusion problem in distributed information retrieval. In Proceedings SIGIR’99 (pp 246–253). ACM PressGoogle Scholar
  8. Bookstein, A. (1977). When the most “pertinent” document should not be retrieved—An analysis of the Swets model. Information Processing and Management 13(6), 377–383.MATHCrossRefGoogle Scholar
  9. Callan, J. (2000). Distributed information retrieval. In Advances information retrieval: Recent research from the CIIR (ir 5, pp. 127–150). Kluwer.Google Scholar
  10. Collins-Thompson, K., Ogilvie, P., Zhang, Y., & Callan, J. (2002). Information filtering, novelty detection, and named-page finding. In Proceedings TREC 2002, NIST.Google Scholar
  11. Cooper, W. S. (1991). Some inconsistencies and misnomers in probabilistic information retrieval. In Proceedings SIGIR’91 (pp. 57–61). ACM Press.Google Scholar
  12. Cooper, W. S., Gey, F. C., & Dabney, D. P. (1992). Probabilistic retrieval based on staged logistic regression. In Proceedings SIGIR’92 (pp. 198–210). ACM Press.Google Scholar
  13. Cooper, W. S., Chen, A., & Gey, F. C. (1994). Experiments in the probabilistic retrieval of full text documents. In Proceedings TREC 1994, NIST.Google Scholar
  14. Cormack, G. V., Lhoták, O., & Palmer, C. R. (1999). Estimating precision by random sampling (poster abstract). In Proceedings SIGIR’99 (pp 273–274). ACM Press.Google Scholar
  15. Cox, D. R. (1970). The analysis of binary data. London: Chapman & Hall.MATHGoogle Scholar
  16. Craswell, N., Robertson, S., Zaragoza, H., & Taylor, M. (2005). Relevance weighting for query-independent evidence. In Proceedings SIGIR’05 (pp. 416–423). ACM Press.Google Scholar
  17. Fernández, M., Vallet, D., & Castells, P. (2006). Probabilistic score normalization for rank aggregation. In ECIR, Lecture notes in computer science (Vol. 3936, pp. 553–556). Springer.Google Scholar
  18. Fernández, M., Vallet, D., & Castells, P. (2006). Using historical data to enhance rank aggregation. In Proceedings SIGIR’06 (pp. 643–644). ACM Press.Google Scholar
  19. Fuhr, N., Pfeifer, U., Bremkamp, C., Pollmann, M., & Buckley, C. (1993). Probabilistic learning approaches for indexing and retrieval with the trec-2 collection. In Proceedings TREC 1993, NIST.Google Scholar
  20. Hawking, D., & Robertson, S. (2003). On collection size and retrieval effectiveness. Information Retrieval 6(1), 99–105.CrossRefGoogle Scholar
  21. Kamps, J., de Rijke, M., & Sigurbjörnsson, B. (2005). Combination methods for crosslingual web retrieval. In CLEF, Lecture notes in computer science (Vol. 4022, pp. 856–864). Springer.Google Scholar
  22. Kanoulas, E., Pavlu, V., Dai, K., & Aslam, J. A. (2009). Modeling the score distributions of relevant and non-relevant documents. In ICTIR, Lecture notes in computer science (Vol. 5766, pp. 152–163). Springer.Google Scholar
  23. Lee, J. H. (1997). Analyses of multiple evidence combination. In Proceedings SIGIR’97 (pp. 267–276). ACM Press.Google Scholar
  24. Lewis, D. D. (1995). Evaluating and optimizing autonomous text classification systems. In Proceedings SIGIR’95 (pp. 246–254). ACM Press.Google Scholar
  25. Manmatha, R., Rath, T. M., & Feng, F. (2001). Modeling score distributions for combining the outputs of search engines. In Proceedings SIGIR’01 (pp. 267–275). ACM Press.Google Scholar
  26. Nottelmann, H., & Fuhr, N. (2003). From uncertain inference to probability of relevance for advanced IR applications. In ECIR, Lecture notes in computer science (Vol. 2633, pp. 235–250). Springer.Google Scholar
  27. Oard, D. W., Hedin, B., Tomlinson, S., & Baron, J. R. (2009). Overview of the TREC 2008 legal track. In Proceedings TREC 2008, NIST.Google Scholar
  28. van Rijsbergen, C. J. (1979). Information retrieval. ButterworthGoogle Scholar
  29. van Rijsbergen, C. J. (1992). Probabilistic retrieval revisited. The Computer Journal 35(3), 291–298.MATHCrossRefGoogle Scholar
  30. Ripley, B. D., & Hjort N. L. (1995). Pattern recognition and neural networks. New York, NY: Cambridge University Press.Google Scholar
  31. Robertson, S. E. (1969). The parametric description of retrieval tests. Part 1: The basic parameters. Journal of Documentation 25(1), 1–27.CrossRefGoogle Scholar
  32. Robertson, S. E. (1977). The probabilistic character of relevance. Information Processing Management 13(4), 247–251.CrossRefGoogle Scholar
  33. Robertson, S. E. (2007). On score distributions and relevance. In ECIR, Lecture notes in computer science (Vol. 4425, pp. 40–51). Springer.Google Scholar
  34. Robertson, S. E., & Bovey, J. D. (1982). Statistical problems in the application of probabilistic models to information retrieval. Technical report, Report No. 5739, BLR&DDGoogle Scholar
  35. Robertson, S. E., & Walker, S. (2000). Threshold setting in adaptive filtering. Journal of Documentation 56, 312–331.CrossRefGoogle Scholar
  36. Savoy, J. (2003). Report on CLEF-2003 multilingual tracks. In CLEF, Lecture notes in computer science (Vol. 3237, pp. 64–73). Springer.Google Scholar
  37. Swets, J. A. (1963). Information retrieval systems. Science 141(3577), 245–250.CrossRefGoogle Scholar
  38. Swets, J. A. (1969). Effectiveness of information retrieval methods. American Documentation 20, 72–89.CrossRefGoogle Scholar
  39. Zhang, Y., & Callan, J. (2001). Maximum likelihood estimation for filtering thresholds. In Proceedings SIGIR’01 (pp. 294–302). ACM Press.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringDemocritus University of ThraceXanthiGreece
  2. 2.Microsoft ResearchCambridgeUK

Personalised recommendations