Advertisement

SIGIR ’94 pp 222-231 | Cite as

Inferring Probability of Relevance Using the Method of Logistic Regression

  • Fredric C. Gey

Abstract

This research evaluates a model for probabilistic text and document retrieval; the model utilizes the technique of logistic regression to obtain equations which rank documents by probability of relevance as a function of document and query properties. Since the model infers probability of relevance from statistical clues present in the texts of documents and queries, we call it logistic inference. By transforming the distribution of each statistical clue into its standardized distribution (one with mean μ = 0 and standard deviation σ = 1), the method allows one to apply logistic coefficients derived from a training collection to other document collections, with little loss of predictive power. The model is applied to three well-known information retrieval test collections, and the results are compared directly to the particular vector space model of retrieval which uses term-frequency/inverse-document-frequency (tfidf) weighting and the cosine similarity measure. In the comparison, the logistic inference method performs significantly better than (in two collections) or equally well as (in the third collection) the tfidf/cosine vector space model. The differences in performances of the two models were subjected to statistical tests to see if the differences are statistically significant or could have occurred by chance.

Keywords

Logistic Regression Information Retrieval Vector Space Model Logistic Inference Test Collection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Salton G et al. The SMART retrieval system: Experiments in automatic document processing. Prentice-Hall, Englewood Cliffs, NJ, 1971Google Scholar
  2. 2.
    Salton G. Text processing: the transformation, analysis and retrieval of information by computer. Addison Wesley, Reading, MA-Menlo Park, CA, 1989Google Scholar
  3. 3.
    Salton G, McGill M. Introduction to modern information retrieval. McGraw-Hill, New York, 1983MATHGoogle Scholar
  4. 4.
    Sparck-Jones K. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 1972; 28: 11–21CrossRefGoogle Scholar
  5. 5.
    Salton G Buckley C. Term weighting approaches in automatic text retrieval. Information Processing and Management 1988; 24: 513–523CrossRefGoogle Scholar
  6. 6.
    Robertson, S. The probability ranking principle in IR. Journal of Documentation 1977; 33: 294–304CrossRefGoogle Scholar
  7. 7.
    Robertson S Sparck-Jones K. Relevance weighting of search terms. Journal of the ASIS 1976; 27: 129–145Google Scholar
  8. 8.
    Cooper W. Inconsistencies and misnomers in probabilistic IR. In: Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Chicago, III, Oct 13–16, 1991, pp 57–61Google Scholar
  9. 9.
    Fuhr N Huther H. Optimum probability estimation from empirical distributions. Information Processing and Management 1989; 25: 493–507CrossRefGoogle Scholar
  10. 10.
    Hosmer D Lemeshow S. Applied logistic regression. John Wiley & Sons, New York, 1989Google Scholar
  11. 11.
    Fox E. Extending the Boolean and Vector Space Models of Information Retrieval with P-Norm Queries and Multiple Concept Types. PhD dissertation, Computer Science, Cornell University, 1983Google Scholar
  12. 12.
    Fuhr N. Optimal polynomial retrieval functions based on the probability ranking principle. ACM Transactions on Informations Systems 1989; 7: 183–204CrossRefGoogle Scholar
  13. 13.
    Fuhr N Buckley C. A probabilistic learning approach for document indexing. ACM Transactions on Informations Systems 1991 9: 223–248CrossRefGoogle Scholar
  14. 14.
    Haines D Croft B. Relevance feedback and inference networks. Proceedings of the 1993 SIGIR International Conference on Information Retrieva 1, Pittsburgh, Pa, June 27-July I, 1993, pp 2–12Google Scholar
  15. 15.
    Turtle H. Inference networks for document retrieval. PhD Dissertation, University of Massachusetts, COINS Technical Report 90–92, February, 1991Google Scholar
  16. 16.
    Fung R Crawford S Appelbaum L Tong R. An architecture for probabilistic concept-bases information retrieval. In: Proceedings of the 13th international conference on research and development in information retrieval. Brussels, Belgium, September 5–7, 1990, pp. 455–467Google Scholar
  17. 17.
    Swanson D. Information retrieval as a trial-and-error process. Library Quarterly 1977; 47: 128–148CrossRefGoogle Scholar
  18. 18.
    Hull D. Using statistical testing in the evaluation of retrieval experiments. Proceedings of the 1993 SIGIR international conference on information retrieval. Pittsburgh, Pa, June 27-July 1, 1993, pp. 329–338Google Scholar
  19. 19.
    Yu C Buckley C Lam H Salton G. A generalized term dependence model in information retrieval. Information Technology: Research and Development 1983; 2: 129–154Google Scholar
  20. 20.
    Cooper W Gey F Chen A. Information retrieval from the TIPSTER collection: an application of staged logistic regression. In: Proceedings of the First NIST Text Retrieval Conference, National Institute for Standards and Technology, Washington, DC, November 4–6, 1992, NIST Special Publication 500–207, March 1993, pp 73–88Google Scholar
  21. 21.
    Harman, D. Overview of the first TREC conference. In: Proceedings of the 1993 SIGIR international conference on information retrieva I, Pittsburgh, Pa, June 27-July 1, 1993, pp 36–47Google Scholar
  22. 22.
    Gey F. Probabilistic dependence and logistic inference in information retrieval. PhD dissertation, University of California, Berkeley, 1993Google Scholar

Copyright information

© Springer-Verlag London Limited 1994

Authors and Affiliations

  • Fredric C. Gey
    • 1
  1. 1.UC Data Archive and Technical AssistanceUniversity of CaliforniaBerkeleyUSA

Personalised recommendations