Why is Information Retrieval a Scientific Discipline?

Abstract

It is relatively easy to state that information retrieval (IR) is a scientific discipline but it is rather difficult to understand why it is science because what is science is still under debate in the philosophy of science. To be able to convince others that IR is science, our ability to explain why is crucial. To explain why IR is a scientific discipline, we use a theory and a model of scientific study, which were proposed recently. The explanation involves mapping the knowledge structure of IR to that of well-known scientific disciplines like physics. In addition, the explanation involves identifying the common aim, principles and assumptions in IR and in well-known scientific disciplines like physics, so that they constrain the scientific investigation in IR in a similar way as in physics. Therefore, there are strong similarities in terms of the knowledge structure and the constraints of the scientific investigations between IR and scientific disciplines like physics. Based on such similarities, IR is considered a scientific discipline.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3

References

  1. Al-Maskari, A., Sanderson, M., & Clough, P. (2008). Relevance judgments between TREC and non-TREC assessors. In Proceedings of the 31st ACM SIGIR conference (pp. 683–684).

  2. Azzopardi, L., & Roelleke, T. (2007). Explicitly considering relevance within the language modeling framework. In Proceedings of the 1st international conference on theory of information retrieval (pp. 125–134).

  3. Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), 452–454.

    Article  Google Scholar 

  4. Basat, R. B., Tennenholtz, M., & Kurland, O. (2015). The probability ranking principle is not optimal in adversarial retrieval settings. In Proceedings of ICTIR’15 (pp. 51–60).

  5. Cartwright, N. (1995). False idealization: A philosophical threat to the scientific method. Philosophical Studies,77(2–3), 339–352.

    Article  Google Scholar 

  6. Cerf, V. G. (2012). Where is the science in computer science? Communications of the ACM,55(10), 5.

    Article  Google Scholar 

  7. Chalmers, A. F. (2013). What is this thing called science?. Maidenhead: Open University Press.

    Google Scholar 

  8. Cleland, C. E. (2001). Historical science, experimental science and the scientific method. Geology,29(11), 987–990.

    Article  Google Scholar 

  9. Cooper, W. S. (1995). Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM Transactions on Information Systems,13(1), 100–111.

    Article  Google Scholar 

  10. Costa, A., & Roda, F. (2011). Recommender systems by means of information retrieval. In Proceedings of WIMS’11, Article no. 57.

  11. Croft, W. B., Metzler, D., & Strohman, T. (2010). Search engines: Information retrieval in practice. Upper Saddle River, NJ: Pearson Addison-Wesley.

    Google Scholar 

  12. Damessie, T. T., Nghiem, T. P., Scholer, F., & Culpeper, J. S. (2017). Gauging the quality of relevance assessments using inter-rater agreement. In Proceedings of the 40th ACM SIGIR conference (pp. 1089–1092).

  13. Dang, E. K. F., Wu, H. C., Luk, R. W. P., & Wong, K. F. (2009). Building a framework for the probability ranking principle by a family of expected weighted rank. ACM Transactions on Information Systems,27, 4.

    Article  Google Scholar 

  14. Denning, P. J. (2005). Is computer science science? Communications of the ACM,48(4), 27–31.

    Article  Google Scholar 

  15. Denning, P. J. (2007). Computing is a natural science. Communications of the ACM,50(7), 13–18.

    Article  Google Scholar 

  16. Denning, P. J. (2013). The science in computer science. Communications of the ACM,56(5), 35–38.

    Article  Google Scholar 

  17. Feyeraband, P. (2011). The tyranny of science. London: Polity Press.

    Google Scholar 

  18. Fuhr, N. (2008). A probability ranking principle for interactive information retrieval. Information Retrieval,11(3), 251–265.

    Article  Google Scholar 

  19. Fuhr, N. (2012). Salton award lecture information retrieval as an engineering science. ACM SIGIR Forum,46(2), 19.

    Article  Google Scholar 

  20. Fuhr, N. (2017). Some common mistakes in IR evaluation, and how they can be avoided. ACM SIGIR Forum,51(3), 32–41.

    Article  Google Scholar 

  21. Gonzalo, G. (2010). Is computer science truly scientific? Communications of the ACM,53(7), 37–39.

    Article  Google Scholar 

  22. Greiff, W. R. (1998). A theory of term weighting based on exploratory data analysis. In Proceedings of the 21st ACM SIGIR conference (pp. 11–19).

  23. Huston, S., & Croft, W. B. (2014). A comparison of retrieval models using term dependencies. In Proceedings of the 23rd ACM CIKM conference (pp. 111–120).

  24. Indri. (2013). INDRI: Language modeling meets inference networks. The Lemur Project. Retrieved June 27, 2020 from http://lemurproject.org/indri/.

  25. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information System,20(4), 422–446.

    Article  Google Scholar 

  26. Kosso, P. (2007). Scientific understanding. Foundations of Science,12(2), 119–130.

    Article  Google Scholar 

  27. Lafferty, J., & Zhai, C. X. (2001). Probabilistic relevance models based on document and query generation. In B. Croft & J. Lafferty (Eds.), Language modeling for information retrieval (pp. 1–10). Dordrecht: Springer.

    Google Scholar 

  28. Lavrenko, V. (2009). A Generative Theory of Relevance. Berlin: Springer.

    Google Scholar 

  29. Lin, J. (2018). The neural hype and comparison against weak baselines. ACM SIGIR Forum,52(2), 40–51.

    Article  Google Scholar 

  30. Luk, R. W. P. (2008). On event space and rank equivalence between probabilistic retrieval models. Information Retrieval,11, 539–561.

    Article  Google Scholar 

  31. Luk, R. W. P. (2010). Understanding scientific study via process modeling. Foundations of Science,15(1), 49–78.

    Article  Google Scholar 

  32. Luk, R. W. P. (2017). A theory of scientific study. Foundations of Science,22(1), 11–38.

    Article  Google Scholar 

  33. Luk, R. W. P. (2018). To explain or to predict: Which one is mandatory? Foundations of Science,23(2), 411–414.

    Article  Google Scholar 

  34. Maron, M. E., & Kuhns, J. L. (1960). On relevance, probabilistic indexing and information retrieval. Journal of the ACM,7(3), 216–244.

    Article  Google Scholar 

  35. Paik, J. H. (2013). A novel TF-IDF weighting scheme for effective ranking. In Proceedings of the 36th ACM SIGIR conference (pp. 343–352).

  36. Popper, K. (1959). The logic of scientific discovery. London: Hutchinson.

    Google Scholar 

  37. Rapaport, W. J. (2019). Philosophy of computer science. Retrieved March 25, 2019 from http://cse.buffalo.edu/~rapaport/Papers/phics.pdf.

  38. Raza, K. (2014). Is the discipline “computer science” a “natural science”? Retrieved June 27, 2020 from https://www.researchgate.net/post/Is_the_discipline_Computer_Science_a_Natural_Science2.

  39. Reiss, J., & Sprenger, J. (2017). Scientific objectivity. In E. N. Zalta (Eds.), The Stanford encyclopedia of philosophy (Winter 2017 Edition). Retrieved June 27, 2020 from https://plato.stanford.edu/archives/win2017/entries/scientific-objectivity.

  40. Robertson, S. E. (1977). The probability ranking principle in IR. Journal of Documentation,33, 294–304.

    Article  Google Scholar 

  41. Robertson, S. E. (2006). On GMAP: And other transformations. In Proceedings of the 15th ACM CIKM conference (pp. 78–83).

  42. Saracevic, T. (1975). Relevance: A review of and a framework for the thinking on the notion in information science. Journal of the Association for Information Science and Technology,26(6), 321–343.

    Google Scholar 

  43. Singhal, A., Buckley, C., & Mitra, M. (1996). Pivoted document length normalization. In Proceedings of the 19th ACM SIGIR conference (pp. 21–29).

  44. Sordoni, A., Nie, J.-Y., & Bengio, Y. (2013). Modeling term dependencies with quantum language models for IR. In Proceedings of the 36th ACM SIGIR conference (pp. 653–662).

  45. Spärck-Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation,28(1), 11–21.

    Article  Google Scholar 

  46. Terrier. (2019). Terrier v5.1. University of Glasgow. Retrieved July 3, 2019 from http://terrier.org.

  47. Van Fraassen, B. (1980). The scientific image. Oxford: Clarendon Press.

    Google Scholar 

  48. Van Rijsbergen, C. J. (1979). Information retrieval. London: Butterworths.

    Google Scholar 

  49. Van Rijsbergen, C. J. K. (2006). Quantum haystacks. In Proceedings of the 29th ACM SIGIR conference (pp. 1–2).

  50. Wong, K. F., Song, D., Bruza, P., & Chen, C.-H. (2001). Application of aboutness to functional benchmarking in information retrieval. ACM Transactions on Information Systems,19(4), 337370.

    Article  Google Scholar 

  51. Wu, H. C., Luk, R. W. P., Wong, K. F., & Kwok, K. L. (2008). Interpreting TF-IDF weights as making relevance decisions. ACM Transactions on Information Systems,26, 3.

    Article  Google Scholar 

  52. Yang, P, & Feng, H. (2016). A reproducibility study of information retrieval models. In Proceedings of ICTIR’16 (pp. 77–86).

  53. Zamani, H., Croft, W. B., & Culpepper, J. S. (2018). Neural query performance prediction using weak supervision from multiple signals. In Proceedings of the 41st ACM SIGIR conference (pp. 105–114).

  54. Zhai, C. X. (2011). Axiomatic analysis and optimization of information retrieval models. In Proceedings of ICTIR 2011 conference (p. 1).

  55. Zhai, C. X., & Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems,22(2), 179–214.

    Article  Google Scholar 

  56. Zobel, J. (2017). What we talk about when we talk about information retrieval. ACM SIGIR Forum,51(3), 18–26.

    Article  Google Scholar 

  57. Zuccon, G., Azzopardi, L. A., & van Rijsbergen, C. J. K. (2009). The quantum probability ranking principle for information retrieval. In Proceedings of the ICTIR ‘09 (pp. 232–240).

  58. Zuo, J., Wang, M., Wan, J., Wu, G., & Wu, S. (2012). Modified information retrieval model based on Markov network. In Proceedings of international conference on network computing and information security (pp. 307–314).

Download references

Acknowledgements

I thank Dr. Edward Dang for running the random search model. I also thank the anonymous reviewers for their constructive, insightful comments.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Robert W. P. Luk.

Ethics declarations

Conflict of interest

The corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Luk, R.W.P. Why is Information Retrieval a Scientific Discipline?. Found Sci (2020). https://doi.org/10.1007/s10699-020-09685-x

Download citation

Keywords

  • Science
  • Information retrieval
  • Physics
  • Correspondence
  • Similarity