Skip to main content
Log in

Between the Devil and the Deep Blue Sea: Tensions Between Scientific Judgement and Statistical Model Selection

  • Published:
Computational Brain & Behavior Aims and scope Submit manuscript

Abstract

Discussions of model selection in the psychological literature typically frame the issues as a question of statistical inference, with the goal being to determine which model makes the best predictions about data. Within this setting, advocates of leave-one-out cross-validation and Bayes factors disagree on precisely which prediction problem model selection questions should aim to answer. In this comment, I discuss some of these issues from a scientific perspective. What goal does model selection serve when all models are known to be systematically wrong? How might “toy problems” tell a misleading story? How does the scientific goal of explanation align with (or differ from) traditional statistical concerns? I do not offer answers to these questions, but hope to highlight the reasons why psychological researchers cannot avoid asking them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. While there are many people who assert that “a single failure is enough to falsify a theory”, I confess I have not yet encountered anyone willing to truly follow this principle in real life.

  2. For instance, Gelman et al. (2003, pp. 586–587) present an analogous convergence result for the posterior distribution P(𝜃|x) within a single model . The result generalises to the Bayes factor by noting that the Bayes factor identifies a model with the prior predictive distribution P(x|). Substituting P(x|) for the role of P(x|𝜃) in their derivation produces the necessary result.

  3. For the purposes of full disclosure, I should note that the precise situation from Lee and Navarro (2002) is quite a bit more complex than this description implies, and there are several details about how we had to adapt a model from one context to be applicable to the other have been omitted.

References

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B.N., Petrov, & F., Csaki (Eds.) Second international symposium on information theory (pp. 267–281). Budapest: Akademiai Kiado.

  • Bernardo, J.M., & Smith, A.F.M. (2000). Bayesian theory, 2nd Edn. New York: John Wiley & Sons.

    Google Scholar 

  • Box, G.E.P. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791–799.

    Article  Google Scholar 

  • Browne, M. (2000). Cross-validation methods. Journal of Mathematical Psychology, 44, 108–132.

    Article  Google Scholar 

  • Devezer, B., Nardin, L.G., Baumgaertner, B., Buzbas, E. (under review). Discovery of truth is not implied by reproducibility but facilitated by innovation and epistemic diversity in a model-centric framework. Manuscript submitted for publication. arXiv:1803.10118.

  • Edwards, W., Lindman, H., Savage, L.J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70, 193–242.

    Article  Google Scholar 

  • Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B. (2003). Bayesian Data Analysis, 2nd Edn. Boca Raton: Chapman & Hall/CRC.

    Google Scholar 

  • Grünwald, P. (2007). The minimum description length principle. Cambridge: MIT Press.

    Book  Google Scholar 

  • Gronau, Q., & Wagenmakers, E.J. (2018). Limitations of Bayesian leave-one-out cross-validation for model selection. Computational Brain and Behavior.

  • Hayes, B.K., Banner, S., Forrester, S., Navarro, D.J. (under review). Sampling frames and inductive inference with censored evidence. Manuscript submitted for publication. https://doi.org/10.17605/OSF.IO/2M83V.

  • Kamin, L.J. (1969). Predictability, surprise, attention, and conditioning. In Campbell, B.A., & Church, R.M. (Eds.) Punishment and Aversive Behavior (pp. 279–296). New York: Appleton-Century-Crofts.

  • Kruschke, J.K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99(1), 22–44.

    Article  Google Scholar 

  • Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.

    Article  Google Scholar 

  • Lattal, K.M., & Nakajima, S. (1998). Overexpectation in appetitive Pavlovian and instrumental conditioning. Animal Learning & Behavior, 26(3), 351–360.

    Article  Google Scholar 

  • Lee, M.D. (2001a). On the complexity of additive clustering models. Journal of Mathematical Psychology, 45, 131–148.

  • Lee, M.D. (2001b). Determining the dimensionality of multidimensional scaling models for cognitive modeling. Journal of Mathematical Psychology, 45, 149–166.

  • Lee, M.D., & Navarro, D.J. (2002). Extending the ALCOVE model of category learning to featural stimulus domains. Psychonomic Bulletin & Review, 9, 43–58.

    Article  Google Scholar 

  • Navarro, D.J. (2004). A note on the applied use of MDL approximations. Neural Computation, 16, 1763–1768.

    Article  Google Scholar 

  • Navarro, D.J., Dry, M.J., Lee, M.D. (2012). Sampling assumptions in inductive generalization. Cognitive Science, 36, 187–223.

    Article  Google Scholar 

  • Navarro, D.J., Pitt, M.A., Myung, I.J. (2004). Assessing the distinguishability of models and the informativeness of data. Cognitive Psychology, 49, 47–84.

    Article  Google Scholar 

  • Pavlov, I. (1927). Conditioned reflexes. London: Oxford University Press.

    Google Scholar 

  • Pitt, M.A., Myung, I.J., Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109, 472–491.

    Article  Google Scholar 

  • Pitt, M.A., Kim, W., Navarro, D.J., Myung, J.I. (2006). Global model analysis by parameter space partitioning. Psychological Review, 113, 57–83.

    Article  Google Scholar 

  • Rescorla, R.A. (1968). Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology, 66, 1–5.

    Article  Google Scholar 

  • Rescorla, R.A. (1969). Conditioned inhibition of fear resulting from negative CS-US contingencies. Journal of Comparative and Physiological Psychology, 67, 504–509.

    Article  Google Scholar 

  • Rescorla, R.A. (1971). Variations in the effectiveness of reinforcement following prior inhibitory conditioning. Learning and Motivation, 2, 113–123.

    Article  Google Scholar 

  • Rescorla, R.A., & Wagner, A.R. (1972). A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In A.H., Black, & W.F., Prokasy (Eds.) Classical conditioning II: current research and theory (pp. 64–99). New York: Appleton-Century-Crofts.

  • Ransom, K., Perfors, A., Navarro, D.J. (2016). Leaping to conclusions: why premise relevance affects argument strength. Cognitive Science, 40, 1775–1796.

    Article  Google Scholar 

  • Rissanen, J. (1996). Fisher information and stochastic complexity. IEEE Transactions on Information Theory, 42, 40–47.

    Article  Google Scholar 

  • Schultz, W., Dayan, P., Montague, P.R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599.

    Article  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.

    Article  Google Scholar 

  • Shao, J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association, 88, 486–494.

    Article  Google Scholar 

  • Shiffrin, R. M., Borner, K., Stigler, S.M. (2018). Scientific progress despite irreproducibility: a seeming paradox. Proceedings of the National Academy of Sciences, USA, 115, 2632–2639.

    Article  Google Scholar 

  • Tenenbaum, J.B., & Griffiths, T.L. (2001). Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences, 24, 629–640.

    PubMed  Google Scholar 

  • Vehtari, A., Simpson, D., Yao, Y., Gelman, A. (2018). Limitations of Bayesian leave-one-out cross-validation. Computational Brain and Behavior.

  • Vehtari, A., & Ojanen, J. (2012). A survey of Bayesian predictive methods for model assessment, selection and comparison. Statistics Surveys, 6, 142–228.

    Article  Google Scholar 

  • Voorspoels, W., Navarro, D.J., Perfors, A., Ransom, K., Storms, G. (2015). How do people learn from negative evidence? Non-monotonic generalizations and sampling assumptions in inductive reasoning. Cognitive Psychology, 81, 1–25.

    Article  Google Scholar 

  • Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804.

    Article  Google Scholar 

  • Wickelgren, W.A. (1972). Trace resistance and decay of long-term memory. Journal of Mathematical Psychology, 9, 418–455.

    Article  Google Scholar 

Download references

Acknowledgements

I am grateful to many people for helpful conversations and comments that shaped this paper, most notably Nancy Briggs, Berna Devezer, Chris Donkin, Olivia Guest, Daniel Simpson, Iris van Rooij and Fred Westbrook.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danielle J. Navarro.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Navarro, D.J. Between the Devil and the Deep Blue Sea: Tensions Between Scientific Judgement and Statistical Model Selection. Comput Brain Behav 2, 28–34 (2019). https://doi.org/10.1007/s42113-018-0019-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42113-018-0019-z

Keywords

Navigation