Measuring Scientific Reasoning Competencies

Multiple Aspects of Validity
  • D. KrügerEmail author
  • S. Hartmann
  • V. Nordmeier
  • A. Upmeier zu Belzen


In this chapter, we investigate multiple aspects of validity of test score interpretations from a scientific reasoning competence test, as well as aspects of reliability. Scientific reasoning competencies are defined as the disposition to solve scientific problems in certain situations by conducting scientific investigations or using scientific models. For the purpose of measurement, the first phase of our project focused on the construction of a paper-pencil assessment instrument – the KoWADiS competence test – for the longitudinal assessment of pre-service science teachers’ scientific reasoning competencies over the course of academic studies. In the second phase of our project, we investigated the reliability of the test scores and the validity of their interpretations. We used a multimethod approach, addressing several sources of validity evidence. Overall, the results are coherent and support the validity assumptions to a satisfactory degree. The long-term goal is the use of this test to provide empirically sound suggestions for pre-service science teacher education at university level.


Scientific reasoning competencies teacher education validity 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Adams, W. K., & Wieman, C. E. (2011). Development and validation of instruments to measure learning of expert-like thinking. International Journal of Science Education, 33(9), 1289–1312.Google Scholar
  2. American Educational Research Association, American Psychological Association & National Council on Measurement in Education [AERA, APA & NCME] (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.Google Scholar
  3. Blömeke, S., Gustafsson, J.-E., & Shavelson, R. (2015). Beyond dichotomies: Competence viewed as a continuum. Zeitschrift für Psychologie, 223(1), 3–13.Google Scholar
  4. Brüggemann, V., & Nordmeier, V. (2018). Naturwissenschaftliches Denken im Lehramtsstudium-Computeradaptive Leistungsmessung. In C. Maurer (Ed.), Qualitätsvoller Chemie- und Physikunterricht – normative und empirische Dimensionen. Gesellschaft für Didaktik der Chemie und Physik Jahrestagung in Regensburg 2017 (pp. 915–918). Regensburg: Universität Regensburg.Google Scholar
  5. Van Buuren, S., & Groothuis-Outshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67.Google Scholar
  6. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.CrossRefGoogle Scholar
  7. Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.Google Scholar
  8. Fischer, F., Kollar, I., Ufer, S., Sodian, B., Hussmann, H., Pekrun, R., Neuhaus, B., Dorner, B., Pankofer, S., Fischer, M., Strijbos, J.-W., Heene, M., & Eberle, J. (2014). Scientific reasoning and argumentation: Advancing an interdisciplinary research agenda in education. Frontline Learning Research, 2(3), 28–45.Google Scholar
  9. Giere, R. N., Bickle, J., & Mauldin, R. F. (2006). Understanding scientific reasoning. Independence, KY: Wadsworth/Cengage Learning.Google Scholar
  10. Gilbert, J. K., & Justi, R. (2016). Modelling-based teaching in science education (Vol. 9). Switzerland: Springer.Google Scholar
  11. Van Gog, T., Paas, F., van Merriënboer, J. J. G., & Witte, P. (2005). Uncovering the problem- solving process: cued retrospective reporting versus concurrent and retrospective reporting. Journal of experimental psychology, 11(4), 237–244.Google Scholar
  12. Gonzalez, E., & Rutkowski, L. (2010). Principles of multiple matrix booklet designs and parameter recovery in large-scale assessments. IEA-ETS Research Institute Monograph, 3, 125-156.Google Scholar
  13. Greiff, S., & Fischer, A. (2013). Der Nutzen einer komplexen Problemlösekompetenz. Zeitschrift für Pädagogische Psychologie, 27(1-2), 27–39.Google Scholar
  14. Großschedl, J., Welter, V., & Harms, U. (2018). A new instrument for measuring pre-service biology teachers’ pedagogical content knowledge: The PCKIBI. Journal of Research in Science Teaching. Advance online publication. Scholar
  15. Gruber, H. (2010). Expertise. In D. H. Rost (Ed.), Handwörterbuch Pädagogische Psychologie (pp. 183–189). Weinheim: Beltz.Google Scholar
  16. Gut-Glanzmann, C., & Mayer, J. (2018). Experimentelle Kompetenz. In D. Krüger, I. Parchmann & H. Schecker (Eds.), Theorien in der naturwissenschaftsdidaktischen Forschung (pp. 121–140). Berlin: Springer.Google Scholar
  17. Harkness, J.A., Braun, M., Ewards, B., Johnson, T.P., Lyberg, L., Mohler, P.Ph., PenneIl, B.-E., & Smith T.W. (Eds.). (2010). Survey methods in multinational, multiregional, and multicultural contexts. New Jersey: John Wiley & Sons.Google Scholar
  18. Hartmann, S., Mathesius, S., Stiller, J., Straube, P., Krüger, D., & Upmeier zu Belzen, A. (2015a). Kompetenzen der naturwissenschaftlichen Erkenntnisgewinnung als Teil des Professionswissens zukünftiger Lehrkräfte. In B. Koch-Priewe, A. Köker, J. Siefried & E. Wuttke (Eds.), Kompetenzerwerb an Hochschulen: Modellierung und Messung (pp. 39–58). Kempten: Klinkhardt.Google Scholar
  19. Hartmann, S., Upmeier zu Belzen, A., Krüger, D., & Pant, H. A. (2015b). Scientific reasoning in higher education: Constructing and evaluating the criterion-related validity of an assessment of preservice science teachers’ competencies. Zeitschrift für Psychologie, 223, 47–53.Google Scholar
  20. Hartmann, S., Krüger, D., & Upmeier zu Belzen, A. (2019a). Investigating the validity and reliability of a scientific reasoning test for pre-service teachers. Vortrag auf der ESERA September 2019, Bologna.Google Scholar
  21. Hartmann, S., Ziegler, M., Krüger, D., & Upmeier zu Belzen, A. (2019b). “Equivalent-groups validation”? Practical application and critical examination of a known-groups approach to investigate the criterion-related validity of test score interpretations.Google Scholar
  22. Kane, M. (2013). The argument-based approach to validation. School Psychology Review, 42, 448–457.Google Scholar
  23. Kanyongo, G. Y., Brook, G. P., Kyei-Blankson, L., & Gocmen, G. (2007). Reliability and statistical power: How measurement fallibility affects power and required sample sizes for several parametric and nonparametric statistics. Journal of Modern Applied Statistical Methods, 6, 81–90. Scholar
  24. Klahr, D. (2000). Exploring science. The cognition and development of discovery processes. Cambridge, MA: MIT Press.Google Scholar
  25. KMK (Ed.). (2014). Ländergemeinsame inhaltliche Anforderungen für die Fachwissenschaften und Fachdidaktiken in der Lehrerbildung. Berlin, Germany: author. Retrieved from
  26. Koeppen, K., Hartig, J., Klieme, E., & Leutner, D. (2008). Current issues in competence modeling and assessment. Zeitschrift für Psychologie, 216, 61–73.Google Scholar
  27. Krell, M., Redman, C., Mathesius, S., Krüger, D., & van Driel, J. (2018). Assessing pre-service science teachers’ scientific reasoning competencies. Research in Science Education, 1–25.Google Scholar
  28. Krell, M., Mathesius, S., van Driel, J., Vergara, C., & Krüger, D. (in preparation). Assessing scientific reasoning competencies of pre-service science teachers: Applying the TRAPD approach to translate a German multiple choice questionnaire into English and Spanish. International Journal of Science Education.Google Scholar
  29. Krüger, D., Kauertz, A., & Upmeier zu Belzen, A. (2018). Modelle und das Modellieren in den Naturwissenschaften. In D. Krüger, I. Parchmann & H. Schecker (Eds.), Theorien in der naturwissenschaftsdidaktischen Forschung (pp. 141–157). Berlin: Springer.Google Scholar
  30. Lawson, A.E., Clark, B., Cramer- Meldrum, E., Falconer, K.A., Sequist, J. M., & Kwon, Y.-J. (2000). Development of scientific reasoning in college biology: Do two levels of general Hypothesis-testing skills exist? Journal of Research in Science Teaching, 37, 81–101.Google Scholar
  31. Leach, J., Millar, R., Ryder, J., & Séré, M.-G. (2000). Epistemological understanding in science learning: the consistency of representations across contexts. Learning and Instruction, 10(6), 497–527.Google Scholar
  32. Liepmann, D., Beauducel, A., Brocke, B., & Amthauer, R. (2007). Intelligenz-Struktur-Test 2000 R (I-S-T 2000 R). Göttingen: Hogrefe.Google Scholar
  33. Mannel, S. (2011). Assessing scientific inquiry. Development and evaluation of a test for the low-performing stage. Berlin: Logos.Google Scholar
  34. Mathesius, S., Upmeier zu Belzen, A., & Krüger, D. (2014). Kompetenzen von Biologiestudierenden im Bereich der naturwissenschaftlichen Erkenntnisgewinnung. Erkenntnisweg Biologiedidaktik, 13, 73–88.Google Scholar
  35. Mathesius, S., Hartmann, S., Upmeier zu Belzen, A., & Krüger, D. (2016). Scientific reasoning as an aspect of pre-service biology teacher education: Assessing competencies using a paper-pencil test. In T. Tal & A. Yarden (Eds.), The future of biology education research (pp. 93–110). Haifa, Israel: The Technion, Israel Institute of Technology/The Weizmann Institute of Science.Google Scholar
  36. Mathesius, S., Upmeier zu Belzen, A., & Krüger, D. (2018). Eyetracking als Methode zur Untersuchung von Lösungsprozessen bei Multiple-Choice-Aufgaben zum wissenschaftlichen Denken. In M. Hammann & M. Lindner (Eds.), Lehr- und Lernforschung in der Biologiedidaktik: Band 8 (pp. 225–244). Innsbruck: Studienverlag.Google Scholar
  37. Mathesius, S., Krell, M., Upmeier zu Belzen, A., & Krüger, D. (2019). Überprüfung eines Tests zum wissenschaftlichen Denken unter Berücksichtigung des Validitätskriteriums relations-to-other-variables. Zeitschrift für Pädagogik, 65(4), 492–510.Google Scholar
  38. Mathesius, S., Bruckermann, T., Schlüter, K., & Krüger, D. (in preparation). Assessing pre-service science teachers’ scientific reasoning competencies: Using known-groups as source of validity evidence for a scientific reasoning test.Google Scholar
  39. Mayer, J. (2007). Erkenntnisgewinnung als wissenschaftliches Problemlösen. In: D. Krüger & H. Vogt (Eds.), Theorien in der biologiedidaktischen Forschung. Ein Handbuch für Lehramtsstudenten und Doktoranden (pp. 177–186). Berlin Heidelberg: Springer.Google Scholar
  40. Neumann, I. (2011). Beyond physics content knowledge. Modeling competence regarding nature of science inquiry and nature of scientific knowledge. Berlin: Logos.Google Scholar
  41. NGSS Lead States (Ed.). (2013). Next generation science standards: for states, by states. Washington, DC: The National Academies Press.Google Scholar
  42. Osborne, J. (2013). The 21st century challenge for science education: Assessing scientific reasoning. Thinking Skills and Creativity, 10, 265–279.CrossRefGoogle Scholar
  43. Osborne, J. (2018). Styles of scientific reasoning: What can we learn from looking at the product, Not the process, of scientific reasoning? In F. Fischer, C. A. Chinn, K. Engelmann & J. Osborne (Eds.), Scientific reasoning and argumentation (pp. 162–186). New York: Taylor & Francis.Google Scholar
  44. Popper, K. R. (2004). Unended quest: An intellectual autobiography. London: Routledge.Google Scholar
  45. Priemer, B., Eilerts, K., Filler, A., Pinkwart, N., Rösken-Winter, B., Tiemann, R., & Upmeier zu Belzen, A. (2019). A framework to foster scientific problem-solving in STEM and computing education. Research in Science & Technological Education, 3(2), 1–26.Google Scholar
  46. Roberts, R., & Gott, R. (2003). Assessment of biology investigations. Journal of Biological Education, 37(3), 114–121.Google Scholar
  47. Rönnebeck, S., Schöps, K., Prenzel, M., Mildner, D., & Hochweber, J. (2010). Naturwissenschaftliche Kompetenz von PISA 2006 bis PISA 2009. In: E. Klieme, C. Artelt, J. Hartig, N. Jude, O. Köller, M. Prenzel, W. Schneider, & P. Stanat (Eds.), PISA 2009 Bilanz nach einem Jahrzehnt (pp. 177–198). Münster: Waxmann.Google Scholar
  48. Schuirmann, D. J. (1987). A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15, 657–680.CrossRefGoogle Scholar
  49. Sonnleitner, P., Keller, U., Romain, M., & Brunner, M. (2013). Students’ Complex Problem- solving Abilities. Intelligence, 41(5), 289–305.Google Scholar
  50. Stiller, J., Hartmann, S., Mathesius, S., Straube, P., Tiemann, R., Nordmeier, V., Krüger, D., & Upmeier zu Belzen, A. (2016). Assessing scientific reasoning: A comprehensive evaluation of item features that affect item difficulty. Assessment and Evaluation in Higher Education, 41(5), 721–732.Google Scholar
  51. Straube, P. (2016). Modellierung und Erfassung von Kompetenzen naturwissenschaftlicher Erkenntnisgewinnung bei (Lehramts-)Studierenden im Fach Physik. Berlin: Logos-Verlag.Google Scholar
  52. Trilling, B., & Fadel, C. (2009). Twenty-first century skills. Learning for life in our times. San Francisco: Jossey-Bass.Google Scholar
  53. Upmeier zu Belzen, A., & Krüger, D. (2010). Modellkompetenz im Biologieunterricht. Zeitschrift für Didaktik der Naturwissenschaften, 16, 41–57.Google Scholar
  54. Wellnitz, N. (2012). Kompetenzstruktur und -niveaus von Methoden der naturwissenschaftlichen Erkenntnisgewinnung. Berlin: Logos.Google Scholar
  55. Wellnitz, N., & Mayer, J. (2013). Erkenntnismethoden in der Biologie – Entwicklung und Evaluation eines Kompetenzmodells. Zeitschrift für Didaktik der Naturwissenschaften, 19, 315–345.Google Scholar
  56. White, P. (2017). Developing research questions. London: Palgrave Macmillan.Google Scholar

Copyright information

© Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2020

Authors and Affiliations

  • D. Krüger
    • 1
    Email author
  • S. Hartmann
    • 1
  • V. Nordmeier
    • 1
  • A. Upmeier zu Belzen
    • 1
  1. 1.Freie Universität BerlinBerlinGermany

Personalised recommendations