Validity-Versus-Reliability Tradeoffs and the Ethics of Educational Research

Fendler, Lynn

doi:10.1007/978-3-319-73921-2_10

Lynn Fendler⁵

Part of the book series: Educational Research ((EDRE,volume 10))

1009 Accesses
1 Citations

Abstract

In educational research that calls itself empirical, the relationship between validity and reliability is that of trade-off: the stronger the bases for validity, the weaker the bases for reliability (and vice versa). Validity and reliability are widely regarded as basic criteria for evaluating research; however, there are ethical implications of the trade-off between the two. The paper traces a brief history of the concepts, and then describes four ethical issues associated with the validity-reliability tradeoff in educational research: bootstrapping, stereotyping, dehumanization, and determinism. The article closes by describing emerging trends in social science research that have the potential to displace the validity-reliability tradeoff as a central concern for the evaluation of educational research: the introduction of translational sciences, a shift from significance to replicability, a move from inference to Big Data, and the increasing importance of consequential validity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
There is some conceptual fuzziness in this paper between educational research and educational testing. For purposes of this paper, I think the distinction is not very important; much empirical educational research is conducted on the basis of educational test results, and testing instruments constitute the data-collection instruments of much empirical research in education. The validity-reliability tradeoff pertains in empirical educational research whether or not tests are involved.
2.
Thanks to Jeff Bale for pointing this out.
3.
I don’t know why scare quotes appear around the term “truth value” but not around the other terms on the list.
4.
I have never understood how research methods or findings could be extrapolated from animals to humans. I just don’t get how it could have occurred to researchers (such as Thorndike) to imagine that findings from experiments on lab rats could be applied to teaching and learning for the people in Teachers College. But we humans can be taught, and apparently we have learned to behave like rats when we are treated as such.
5.
The other purposes specified by Biesta (2010) are qualification and socialization. Biesta uses the term subjectification very differently from the way Foucault uses it.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Google Scholar
Baker, E. L. (2013). The chimera of validity. Teachers College Record, 115(9), 1–26. http://www.tcrecord.org. ID Number: 17106. Date Accessed: 4/23/2015 3:06:16 PM.
Google Scholar
Biesta, G. (2010). Good education in an age of measurement: Ethics, politics, democracy. Boulder: Paradigm Publishers.
Google Scholar
Campbell, D. T. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.
Article Google Scholar
Castel, R. (1991). From dangerousness to risk. In G. Burchell, C. Gordon, & P. Miller (Eds.), The Foucault effect: Studies in governmentality (pp. 281–298). Chicago: University of Chicago Press.
Google Scholar
Cherryholmes, C. H. (1988). Power and criticism: Poststructural investigations in education. New York: Teachers College Press.
Google Scholar
Cizek, G. J. (2007, August). Introduction to modern validity theory and practice. Invited presentation to the National Assessment Governing Board, McLean, VA. Available: https://www.nagb.gov/content/nagb/assets/documents/naep/cizek-introduction-validity.pdf
Cochran-Smith, M., & Lytle, S. L. (1999). The teacher research movement: A decade later. Educational Researcher, 28, 15–25. https://doi.org/10.3102/0013189X028007015.
Article Google Scholar
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Orlando: Harcourt Brace.
Google Scholar
Cronbach, L. J. (1969). Validation of educational measures. In P. H. H. DuBois (Ed.), Proceedings of the invitational conference on testing problems (pp. 35–52). Princeton: Educational Testing Service.
Google Scholar
Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 443–507). Washington, DC: American Council on Education.
Google Scholar
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957. PMID 13245896.
Article Google Scholar
Eubanks, D. (2012, June 8). Bad reliability, part two. http://highered.blogspot.com/2012/06/bad-reliability-part-two.html
Fendler, L. (2006). Why generalisability is not generalisable. Journal of the Philosophy of Education, 40(4), 437–449.
Article Google Scholar
Fendler, L., & Muzaffar, I. (2008). The history of the bell curve: Sorting and the idea of normal. Educational Theory, 58(1), 63–82.
Article Google Scholar
Fenstermacher, G. (1994). The knower and the known: The nature of knowledge in research on teaching. In L. Darling-Hammond (Ed.), Review of research in education (Vol. 20, pp. 3–56). Washington, DC: American Educational Research Association.
Google Scholar
Fiske, D. W. (2002). Validity for what? In H. I. Braun, N. Jackson, & D. Wiley (Eds.), The role of constructs in psychological and educational measurement (pp. 169–178). Hillsdale: Lawrence Erlbaum.
Google Scholar
Gigerenzer, G., & Marewski, J. N. (2015, February). Surrogate science: The idol of a universal method for scientific inference. Journal of Management, 41(2), 421–440. https://doi.org/10.1177/0149206314547522.
Article Google Scholar
Gould, S. J. (1981). The mismeasure of man. New York: W.W. Norton.
Google Scholar
Hacking, I. (1995). The looping effects of human kinds. In D. Sperber, D. Premack, & A. J. Premack (Eds.), Causal cognition: A multidisciplinary debate (pp. 351–394). Oxford: Clarendon Press.
Google Scholar
Heilbron, J., Magnusson, L., & Wittrock, B. (Eds.). (1998). The rise of the social sciences and the formation of modernity: Conceptual change in context, 1750–1850. Boston: Kluwer Academic Publishers.
Google Scholar
Jenkins, J. G. (1946). Validity for what? Journal of Consulting Psychology, 10, 93–98.
Article Google Scholar
Kadir, K. A. (2008). Framing a validity argument for test use and impact: The Malaysian public service experience (esp. chapter 2 on history of validity p. 29). Dissertation.
Google Scholar
Karson, M. (2007). Nomothetic versus idiographic. In N. J. Salkind & K. Rasmussen (Eds.), Encyclopedia of Measurement and statistics. New York: Sage. https://doi.org/10.4135/9781412952644.
Google Scholar
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Newbury Park: Sage.
Google Scholar
Lippman, W. (1922, November 8). The reliability of intelligence tests. The New Republic (pp. 275–277).
Google Scholar
MacKenzie, S. B. (2003). The dangers of poor construct conceptualization. Journal of the Academy of Marketing Science, 31(3), 323–326.
Article Google Scholar
Matters, G., & Pitman, J. A. (1994). The validity–reliability trade-off. 20th annual conference of the International Association for Educational Assessment (IAEA). Wellington.
Google Scholar
Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35(11), 1012–1027.
Article Google Scholar
Messick, S. (1998). Test validity: A matter of consequence. Social Indicators Research, 45(1–3), 35–44.
Article Google Scholar
Moss, P. A. (1992). Shifting conceptions of validity in educational measurement: Implications for performance assessment. Review of Educational Research, 62(3), 229. Retrieved from http://ezproxy.msu.edu.proxy1.cl.msu.edu/login?url=http://search.proquest.com.proxy1.cl.msu.edu/docview/1290947129?accountid=12598.
Article Google Scholar
NIH [National Institutes of Health]. (2007, October 15). National center for advancing translational sciences. Available: https://ncats.nih.gov/. Accessed 31 Oct 2015.
Nuzzo, R. (2014, February 13). Scientific method: Statistical errors. Nature, 506, 150–152. https://doi.org/10.1038/506150a. http://www.nature.com/news/scientific-method-statistical-errors-1.14700
Paloma, C. A., & Banta, T. W. (1999). Assessment essentials: Planning, implementing, improving. New York: Jossey-Bass.
Google Scholar
Reliability vs. validity. (2005, September 26). Bloomberg business. Online version available: http://www.bloomberg.com/bw/stories/2005-09-28/reliability-vs-dot-validity
Schwartz, D. L., & Arena, D. (2013). Measuring what matters most: Choice-based assessments for the digital age. Cambridge, MA: MIT Press.
Google Scholar
Shepard, L. A. (2013). Validity for what purpose? Teachers College Record, 115(9), 1–12. http://www.tcrecord.org ID Number: 17116, Date Accessed: 10/14/2015 8:12:55 AM.
Google Scholar
Shultz, M. M., & Zedeck, S. (2008). Identification, development, and validation of predictors for successful lawyering. Berkeley Law School Research Grant Report. https://www.law.berkeley.edu/files/LSACREPORTfinal-12.pdf
Siegfried, T. (2015, July 2). Science is heroic, with a tragic (statistical) flaw. Science News Online. https://www.sciencenews.org/blog/context/science-heroic-tragic-statistical-flaw
Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African-Americans. Journal of Personality and Social Psychology, 69(5), 797–811.
Article Google Scholar
Terman, L. M. (1916). The measurement of intelligence: An explanation of and a complete guide for the use of the Stanford revision and extension of the Binet-Simon intelligence scale. Boston: Houghton Mifflin.
Book Google Scholar
Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. The Psychological Review: Monograph Supplements, 2(4), i–109. https://doi.org/10.1037/h0092987.
Google Scholar
Westen, D., & Rosenthal, R. (2003). Quantifying construct validity: Two simple measures. Journal of Personality and Social Psychology, 84(3), 608–618. https://doi.org/10.1037/0022-3514.84.3.608. Accessed 23 Oct 2015 4:30:19 PM EDT.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Teacher Education, Michigan State University, East Lansing, MI, USA
Lynn Fendler

Authors

Lynn Fendler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lynn Fendler .

Editor information

Editors and Affiliations

Faculty of Psychology and Educational Sciences, Ghent University and K.U. Leuven, Belgium
Paul Smeyers
Subfaculteit Psychologie en Pedagogische Wetenschappen, K.U. Leuven, Belgium
Marc Depaepe

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fendler, L. (2018). Validity-Versus-Reliability Tradeoffs and the Ethics of Educational Research. In: Smeyers, P., Depaepe, M. (eds) Educational Research: Ethics, Social Justice, and Funding Dynamics. Educational Research, vol 10. Springer, Cham. https://doi.org/10.1007/978-3-319-73921-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-73921-2_10
Published: 23 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73920-5
Online ISBN: 978-3-319-73921-2
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics