It is relatively easy to state that information retrieval (IR) is a scientific discipline but it is rather difficult to understand why it is science because what is science is still under debate in the philosophy of science. To be able to convince others that IR is science, our ability to explain why is crucial. To explain why IR is a scientific discipline, we use a theory and a model of scientific study, which were proposed recently. The explanation involves mapping the knowledge structure of IR to that of well-known scientific disciplines like physics. In addition, the explanation involves identifying the common aim, principles and assumptions in IR and in well-known scientific disciplines like physics, so that they constrain the scientific investigation in IR in a similar way as in physics. Therefore, there are strong similarities in terms of the knowledge structure and the constraints of the scientific investigations between IR and scientific disciplines like physics. Based on such similarities, IR is considered a scientific discipline.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Al-Maskari, A., Sanderson, M., & Clough, P. (2008). Relevance judgments between TREC and non-TREC assessors. In Proceedings of the 31st ACM SIGIR conference (pp. 683–684).
Azzopardi, L., & Roelleke, T. (2007). Explicitly considering relevance within the language modeling framework. In Proceedings of the 1st international conference on theory of information retrieval (pp. 125–134).
Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), 452–454.
Basat, R. B., Tennenholtz, M., & Kurland, O. (2015). The probability ranking principle is not optimal in adversarial retrieval settings. In Proceedings of ICTIR’15 (pp. 51–60).
Cartwright, N. (1995). False idealization: A philosophical threat to the scientific method. Philosophical Studies,77(2–3), 339–352.
Cerf, V. G. (2012). Where is the science in computer science? Communications of the ACM,55(10), 5.
Chalmers, A. F. (2013). What is this thing called science?. Maidenhead: Open University Press.
Cleland, C. E. (2001). Historical science, experimental science and the scientific method. Geology,29(11), 987–990.
Cooper, W. S. (1995). Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM Transactions on Information Systems,13(1), 100–111.
Costa, A., & Roda, F. (2011). Recommender systems by means of information retrieval. In Proceedings of WIMS’11, Article no. 57.
Croft, W. B., Metzler, D., & Strohman, T. (2010). Search engines: Information retrieval in practice. Upper Saddle River, NJ: Pearson Addison-Wesley.
Damessie, T. T., Nghiem, T. P., Scholer, F., & Culpeper, J. S. (2017). Gauging the quality of relevance assessments using inter-rater agreement. In Proceedings of the 40th ACM SIGIR conference (pp. 1089–1092).
Dang, E. K. F., Wu, H. C., Luk, R. W. P., & Wong, K. F. (2009). Building a framework for the probability ranking principle by a family of expected weighted rank. ACM Transactions on Information Systems,27, 4.
Denning, P. J. (2005). Is computer science science? Communications of the ACM,48(4), 27–31.
Denning, P. J. (2007). Computing is a natural science. Communications of the ACM,50(7), 13–18.
Denning, P. J. (2013). The science in computer science. Communications of the ACM,56(5), 35–38.
Feyeraband, P. (2011). The tyranny of science. London: Polity Press.
Fuhr, N. (2008). A probability ranking principle for interactive information retrieval. Information Retrieval,11(3), 251–265.
Fuhr, N. (2012). Salton award lecture information retrieval as an engineering science. ACM SIGIR Forum,46(2), 19.
Fuhr, N. (2017). Some common mistakes in IR evaluation, and how they can be avoided. ACM SIGIR Forum,51(3), 32–41.
Gonzalo, G. (2010). Is computer science truly scientific? Communications of the ACM,53(7), 37–39.
Greiff, W. R. (1998). A theory of term weighting based on exploratory data analysis. In Proceedings of the 21st ACM SIGIR conference (pp. 11–19).
Huston, S., & Croft, W. B. (2014). A comparison of retrieval models using term dependencies. In Proceedings of the 23rd ACM CIKM conference (pp. 111–120).
Indri. (2013). INDRI: Language modeling meets inference networks. The Lemur Project. Retrieved June 27, 2020 from http://lemurproject.org/indri/.
Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information System,20(4), 422–446.
Kosso, P. (2007). Scientific understanding. Foundations of Science,12(2), 119–130.
Lafferty, J., & Zhai, C. X. (2001). Probabilistic relevance models based on document and query generation. In B. Croft & J. Lafferty (Eds.), Language modeling for information retrieval (pp. 1–10). Dordrecht: Springer.
Lavrenko, V. (2009). A Generative Theory of Relevance. Berlin: Springer.
Lin, J. (2018). The neural hype and comparison against weak baselines. ACM SIGIR Forum,52(2), 40–51.
Luk, R. W. P. (2008). On event space and rank equivalence between probabilistic retrieval models. Information Retrieval,11, 539–561.
Luk, R. W. P. (2010). Understanding scientific study via process modeling. Foundations of Science,15(1), 49–78.
Luk, R. W. P. (2017). A theory of scientific study. Foundations of Science,22(1), 11–38.
Luk, R. W. P. (2018). To explain or to predict: Which one is mandatory? Foundations of Science,23(2), 411–414.
Maron, M. E., & Kuhns, J. L. (1960). On relevance, probabilistic indexing and information retrieval. Journal of the ACM,7(3), 216–244.
Paik, J. H. (2013). A novel TF-IDF weighting scheme for effective ranking. In Proceedings of the 36th ACM SIGIR conference (pp. 343–352).
Popper, K. (1959). The logic of scientific discovery. London: Hutchinson.
Rapaport, W. J. (2019). Philosophy of computer science. Retrieved March 25, 2019 from http://cse.buffalo.edu/~rapaport/Papers/phics.pdf.
Raza, K. (2014). Is the discipline “computer science” a “natural science”? Retrieved June 27, 2020 from https://www.researchgate.net/post/Is_the_discipline_Computer_Science_a_Natural_Science2.
Reiss, J., & Sprenger, J. (2017). Scientific objectivity. In E. N. Zalta (Eds.), The Stanford encyclopedia of philosophy (Winter 2017 Edition). Retrieved June 27, 2020 from https://plato.stanford.edu/archives/win2017/entries/scientific-objectivity.
Robertson, S. E. (1977). The probability ranking principle in IR. Journal of Documentation,33, 294–304.
Robertson, S. E. (2006). On GMAP: And other transformations. In Proceedings of the 15th ACM CIKM conference (pp. 78–83).
Saracevic, T. (1975). Relevance: A review of and a framework for the thinking on the notion in information science. Journal of the Association for Information Science and Technology,26(6), 321–343.
Singhal, A., Buckley, C., & Mitra, M. (1996). Pivoted document length normalization. In Proceedings of the 19th ACM SIGIR conference (pp. 21–29).
Sordoni, A., Nie, J.-Y., & Bengio, Y. (2013). Modeling term dependencies with quantum language models for IR. In Proceedings of the 36th ACM SIGIR conference (pp. 653–662).
Spärck-Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation,28(1), 11–21.
Terrier. (2019). Terrier v5.1. University of Glasgow. Retrieved July 3, 2019 from http://terrier.org.
Van Fraassen, B. (1980). The scientific image. Oxford: Clarendon Press.
Van Rijsbergen, C. J. (1979). Information retrieval. London: Butterworths.
Van Rijsbergen, C. J. K. (2006). Quantum haystacks. In Proceedings of the 29th ACM SIGIR conference (pp. 1–2).
Wong, K. F., Song, D., Bruza, P., & Chen, C.-H. (2001). Application of aboutness to functional benchmarking in information retrieval. ACM Transactions on Information Systems,19(4), 337370.
Wu, H. C., Luk, R. W. P., Wong, K. F., & Kwok, K. L. (2008). Interpreting TF-IDF weights as making relevance decisions. ACM Transactions on Information Systems,26, 3.
Yang, P, & Feng, H. (2016). A reproducibility study of information retrieval models. In Proceedings of ICTIR’16 (pp. 77–86).
Zamani, H., Croft, W. B., & Culpepper, J. S. (2018). Neural query performance prediction using weak supervision from multiple signals. In Proceedings of the 41st ACM SIGIR conference (pp. 105–114).
Zhai, C. X. (2011). Axiomatic analysis and optimization of information retrieval models. In Proceedings of ICTIR 2011 conference (p. 1).
Zhai, C. X., & Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems,22(2), 179–214.
Zobel, J. (2017). What we talk about when we talk about information retrieval. ACM SIGIR Forum,51(3), 18–26.
Zuccon, G., Azzopardi, L. A., & van Rijsbergen, C. J. K. (2009). The quantum probability ranking principle for information retrieval. In Proceedings of the ICTIR ‘09 (pp. 232–240).
Zuo, J., Wang, M., Wan, J., Wu, G., & Wu, S. (2012). Modified information retrieval model based on Markov network. In Proceedings of international conference on network computing and information security (pp. 307–314).
I thank Dr. Edward Dang for running the random search model. I also thank the anonymous reviewers for their constructive, insightful comments.
Conflict of interest
The corresponding author states that there is no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Luk, R.W.P. Why is Information Retrieval a Scientific Discipline?. Found Sci (2020). https://doi.org/10.1007/s10699-020-09685-x
- Information retrieval