Abstract
This research evaluates a model for probabilistic text and document retrieval; the model utilizes the technique of logistic regression to obtain equations which rank documents by probability of relevance as a function of document and query properties. Since the model infers probability of relevance from statistical clues present in the texts of documents and queries, we call it logistic inference. By transforming the distribution of each statistical clue into its standardized distribution (one with mean μ = 0 and standard deviation σ = 1), the method allows one to apply logistic coefficients derived from a training collection to other document collections, with little loss of predictive power. The model is applied to three well-known information retrieval test collections, and the results are compared directly to the particular vector space model of retrieval which uses term-frequency/inverse-document-frequency (tfidf) weighting and the cosine similarity measure. In the comparison, the logistic inference method performs significantly better than (in two collections) or equally well as (in the third collection) the tfidf/cosine vector space model. The differences in performances of the two models were subjected to statistical tests to see if the differences are statistically significant or could have occurred by chance.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Salton G et al. The SMART retrieval system: Experiments in automatic document processing. Prentice-Hall, Englewood Cliffs, NJ, 1971
Salton G. Text processing: the transformation, analysis and retrieval of information by computer. Addison Wesley, Reading, MA-Menlo Park, CA, 1989
Salton G, McGill M. Introduction to modern information retrieval. McGraw-Hill, New York, 1983
Sparck-Jones K. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 1972; 28: 11–21
Salton G Buckley C. Term weighting approaches in automatic text retrieval. Information Processing and Management 1988; 24: 513–523
Robertson, S. The probability ranking principle in IR. Journal of Documentation 1977; 33: 294–304
Robertson S Sparck-Jones K. Relevance weighting of search terms. Journal of the ASIS 1976; 27: 129–145
Cooper W. Inconsistencies and misnomers in probabilistic IR. In: Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Chicago, III, Oct 13–16, 1991, pp 57–61
Fuhr N Huther H. Optimum probability estimation from empirical distributions. Information Processing and Management 1989; 25: 493–507
Hosmer D Lemeshow S. Applied logistic regression. John Wiley & Sons, New York, 1989
Fox E. Extending the Boolean and Vector Space Models of Information Retrieval with P-Norm Queries and Multiple Concept Types. PhD dissertation, Computer Science, Cornell University, 1983
Fuhr N. Optimal polynomial retrieval functions based on the probability ranking principle. ACM Transactions on Informations Systems 1989; 7: 183–204
Fuhr N Buckley C. A probabilistic learning approach for document indexing. ACM Transactions on Informations Systems 1991 9: 223–248
Haines D Croft B. Relevance feedback and inference networks. Proceedings of the 1993 SIGIR International Conference on Information Retrieva 1, Pittsburgh, Pa, June 27-July I, 1993, pp 2–12
Turtle H. Inference networks for document retrieval. PhD Dissertation, University of Massachusetts, COINS Technical Report 90–92, February, 1991
Fung R Crawford S Appelbaum L Tong R. An architecture for probabilistic concept-bases information retrieval. In: Proceedings of the 13th international conference on research and development in information retrieval. Brussels, Belgium, September 5–7, 1990, pp. 455–467
Swanson D. Information retrieval as a trial-and-error process. Library Quarterly 1977; 47: 128–148
Hull D. Using statistical testing in the evaluation of retrieval experiments. Proceedings of the 1993 SIGIR international conference on information retrieval. Pittsburgh, Pa, June 27-July 1, 1993, pp. 329–338
Yu C Buckley C Lam H Salton G. A generalized term dependence model in information retrieval. Information Technology: Research and Development 1983; 2: 129–154
Cooper W Gey F Chen A. Information retrieval from the TIPSTER collection: an application of staged logistic regression. In: Proceedings of the First NIST Text Retrieval Conference, National Institute for Standards and Technology, Washington, DC, November 4–6, 1992, NIST Special Publication 500–207, March 1993, pp 73–88
Harman, D. Overview of the first TREC conference. In: Proceedings of the 1993 SIGIR international conference on information retrieva I, Pittsburgh, Pa, June 27-July 1, 1993, pp 36–47
Gey F. Probabilistic dependence and logistic inference in information retrieval. PhD dissertation, University of California, Berkeley, 1993
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer-Verlag London Limited
About this paper
Cite this paper
Gey, F.C. (1994). Inferring Probability of Relevance Using the Method of Logistic Regression. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_23
Download citation
DOI: https://doi.org/10.1007/978-1-4471-2099-5_23
Publisher Name: Springer, London
Print ISBN: 978-3-540-19889-5
Online ISBN: 978-1-4471-2099-5
eBook Packages: Springer Book Archive