Skip to main content

Term Dependence Statistical Measures for Information Retrieval Tasks

  • Conference paper
  • First Online:
Book cover Advances in Artificial Intelligence and Soft Computing (MICAI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9413))

Included in the following conference series:

  • 1122 Accesses

Abstract

In the information retrieval (IR) research community, it is commonly accepted that independence assumptions in probabilistic IR models are inaccurate. The need for modeling term dependencies has been stressed in the literature. However, little or nothing has been said on the statistical nature of these dependencies. We investigate statistical measures of term-to-query and document term-to-term pairs dependence, using several test collections. We show that document entropy is highly correlated to dependence, but that high ratios of linearly uncorrelated pairs, do not necessarily mean independent pairs. A robust IR model should then consider both dependence and independence phenomena.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This independence assumption is not a marginal one, since the probability of a term, given the knowledge of relevance and the query, is not obtained from the summation over the marginal terms of the joint distribution (see [11] for details). It is unclear however, whether the assumption refers to a pairwise- or a mutually- independence hypothesis.

  2. 2.

    http://ir.dcs.gla.ac.uk/resources/test_collections/.

References

  1. Bendersky, M., Croft, W.B.: Modeling higher-order term dependencies in information retrieval using query hypergraphs. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 941–950. ACM, New York (2012). http://doi.acm.org/10.1145/2348283.2348408

  2. Choi, S., Choi, J., Yoo, S., Kim, H., Lee, Y.: Semantic concept-enriched dependence model for medical information retrieval. J. Biomed. Inform. 47, 18–27 (2014)

    Article  Google Scholar 

  3. Galton, F.: Regression towards mediocrity in hereditary stature. J. Anthropol. Inst. G. B. Irel. 15, 246–263 (1886). http://dx.doi.org/10.2307/2841583

    Google Scholar 

  4. Huston, S., Culpepper, J.S., Croft, W.B.: Indexing word sequences for ranked retrieval. ACM Trans. Inf. Syst. (TOIS) 32(1), 3 (2014)

    Article  Google Scholar 

  5. Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments. Inf. Process. Manage. 36(6), 779–808 (2000). http://dx.doi.org/10.1016/S0306-4573(00)00015-7

    Article  Google Scholar 

  6. Lu, W., Robertson, S., MacFarlane, A.: Field-weighted XML retrieval based on BM25. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 161–171. Springer, Heidelberg (2006)

    Google Scholar 

  7. Margulis, E.L.: N-poisson document modelling. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1992, pp. 177–189. ACM, New York (1992). http://doi.acm.org/10.1145/133160.133195

  8. Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 472–479. ACM, New York (2005). http://doi.acm.org/10.1145/1076034.1076115

  9. Mittendorf, E., Mateev, B., Schäuble, P.: Using the co-occurrence of words for retrieval weighting. Inf. Retr. 3(3), 243–251 (2000). http://dx.doi.org/10.1023/A:1026520926673

    Article  MATH  Google Scholar 

  10. Rijsbergen, C.V.: A theoretical basis for the use of cooccurrence data in information retrieval. J. Documentation 33(2), 106–119 (1977). http://dx.doi.org/10.1108/eb026637

    Article  Google Scholar 

  11. Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). http://dx.doi.org/10.1561/1500000019

    Article  Google Scholar 

  12. Roelleke, T.: Information Retrieval Models: Foundations & Relationships. Synthesis Lectures on Information Concepts, Retrieval, and Services, Morgan & Claypool Publishers (2013). http://dx.doi.org/10.2200/S00494ED1V01Y201304ICR027

  13. Roelleke, T., Wang, J., Robertson, S.: Probabilistic retrieval models and binary independence retrieval bir model. In: Liu, L., Zsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 2156–2160. Springer, US (2009)

    Google Scholar 

  14. Saini, B., Singh, V., Kumar, S.: Information retrieval models and searching methodologies: Survey. Information Retrieval 1(2) (2014)

    Google Scholar 

  15. Salton, G., Buckley, C., Yu, C.T.: An evaluation of term dependence models in information retrieval. In: Salton, G., Schneider, H.-J. (eds.) SIGIR 1982. lncs, vol. 146, pp. 151–173. Springer, Heidelberg (1982)

    Chapter  Google Scholar 

  16. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988). http://dx.doi.org/10.1016/0306-4573(88)90021-0

    Article  Google Scholar 

  17. Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman & Hall/CRC, New York (2007)

    MATH  Google Scholar 

  18. Song, R., Yu, L., Wen, J.R., Hon, H.W.: A proximity probabilistic model for information retrieval. Technical report, Citeseer (2011)

    Google Scholar 

  19. Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15, 88–103 (1904)

    Google Scholar 

Download references

Acknowledgement

This research was partially supported by the Consejo Nacional de Ciencia y Tecnologia (CONACYT) through the scholarship grant No. 296232.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge Hermosillo Valadez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Fernández-Reyes, F.C., Valadez, J.H., Suárez, Y.G. (2015). Term Dependence Statistical Measures for Information Retrieval Tasks. In: Sidorov, G., Galicia-Haro, S. (eds) Advances in Artificial Intelligence and Soft Computing. MICAI 2015. Lecture Notes in Computer Science(), vol 9413. Springer, Cham. https://doi.org/10.1007/978-3-319-27060-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27060-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27059-3

  • Online ISBN: 978-3-319-27060-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics