In determining the effectiveness of educational interventions, the Gold Standard requires the use of tests and assessments of proven validity. Messick (1989) defined validity as “an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores” (p. 13). Education researchers wishing to evaluate the effectiveness of educational interventions and programs under the Gold Standard must either develop and validate their own tests and assessments or use ones developed and validated by others. As a result of the No Child Left Behind federal legislative mandate for Grades K-12 in the United States (NCLB, 2002), educational research through intervention programs that improve student learning in mathematics, reading, and science education in Grades K-12 have one natural test of interest: the standardized examination used in the state for determining student proficiency status and school and district proficiency rates. Local school personnel and state education professionals are particularly interested in research showing improvements in student performance on these high-stakes tests. Other standardized assessments that can be used to show the effectiveness of an educational program or intervention are the National Assessment of Educational Progress (NAEP, US National Center for Education Statistics, n.d.), the ACT®, (ACT, n.d.), and the SAT« (College Board, n.d.).
However, the use of state NCLB tests and these other assessments is precluded in many situations. For example, the educational program or intervention may be targeted at a subject area not covered by these assessments, such as history or study in a foreign language. Even if the subject area is in mathematics, reading, or science, the goal of the intervention may not align with the underlying curriculum and goals of the NCLB tests in the subject area. For example, programs focusing on the development of problem-solving skills in mathematics may have different goals than the curriculum tested on the NAEP or the NCLB state assessment. These assessments would not be good measures of the effectiveness of this type of intervention program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ackerman, T. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20(4), 311–329.
ACT. (n.d.).Homepage. Retrieved July 11, 2008, from http://www.act.org/
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F. M. Lord&M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479).Reading, MA: Addison-Wesley.
Bock, R. D.,&Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters:Application of an EM algorithm. Psychometrika, 46(4), 443–459.
Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71(3), 425–440.
College Board. (n.d.). About the SAT. Retrieved May 15, 2008, from http://www.collegeboard.com/student/testing/sat/about.html
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3),297–334.
Froelich, A. G. (2008). A new bias correction method for the DIMTEST procedure. Unpublished manuscript, Iowa State University at Ames.
Froelich, A. G.,&Habing, B. (2008). Conditional covariance-based subtest selection for DIMTEST.Applied Psychological Measurement, 32(2), 138–155.
Haertel, E. H. (2006). Reliability. In R. L. Brennan (Ed.), Educational measurement (4th edn., pp.65–110). Westport, CT: American Council on Education&Praeger.
Humphreys, L. C. (1985). General intelligence: An integration of factor, test, and simplex theory.In B. J. Wolman (Ed.), Handbook of intelligence: Theories, measurements, and applications (pp. 201–224). New York: John Wiley&Sons.
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th edn., pp.17–64). Westport, CT: American Council on Education&Praeger.
Kim, H. R. (1994). New techniques for the dimensionality assessment of standardized test data.Unpublished doctoral dissertation, University of Illinois, Urbana-Champaign.
Linden, W. J., van der,&Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale,NJ: Lawrence Erlbaum.
Lord, F. M. (1984). Standard errors of measurement at different ability levels. Journal of Educational Measurement, 21(3), 239–243.
Lord, F. M.,&Novick, M. R. (1968). Statistical theories of mental test scores Reading, MA:Addison-Wesley.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd edn., pp. 13–103).New York: American Council on Education.
Mislevy, R. J.,&Bock, R. D. (1984). Item operating characteristics of the Armed Services Aptitude Battery, Form 8A (Technical Report N00014-83-C-0283). Washington, DC: Office of Naval Research.
Mokken, R. J. (1971). A theory and procedure of scale analysis with applications in political research. The Hague, The Netherlands: Mouton.
Molenaar, I. W.,&Sijtsma, K. (2000). User's manual MSP5 for Windows. Groningen, The Netherlands: iecProGAMMA.
Nandakumar, R.,&Stout, W. F. (1993). Refinements of Stout's procedure for assessing latent trai unidimensionality. Journal of Educational Statistics, 18(1), 41–68.
No Child Left Behind Act of 2001. Pub. L. No. 107–110, 115 Stat. 1425. (2002).
Reckase, M. D. (1997). A linear logistic model for dichotomous item response data. In W. J. van der Linden&R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 271–286). New York: Springer.
Roussos, L. A., Stout, W. F.,&Marden, J. I. (1998). Using new proximity measures with hierarchical cluster analysis to detect multidimensionality. Journal of Educational Measurement, 35(1), 1–30.
Sijtsma, K.,&Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks, CA: Sage.
Stout, W. F. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52(4), 589–617.
Stout, W. F. (1990). A new item response theory modeling approach with applications to unidi-mensionality assessment and ability estimation. Psychometrika, 55(2), 293–325.
Stout, W. F., Froelich, A. G.,&Gao, F. (2001). Using resampling to produce an improved DIMTEST procedure. In A. Boomsma, M. A. J. van Dujin,&T. A. B. Snijders (Eds.), Essays on item response theory (pp. 357–375). Dordrecht, The Netherlands: Springer.
Stout, W. F., Habing, B., Douglas, J., Hae Rim, K., Roussos, L.,&Zhang, J. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20(4), 331–354.
Traub, R. E. (1994). Reliability for the social sciences: Theory and applications. Thousand Oaks, CA: Sage.
United States National Center for Education Statistics. (n.d.). NAEP: The nation's report card. Retrieved July 11, 2008, from http://nces.ed.gov/nationsreportcard/
Zhang, J.,&Stout, W. F. (1999a). Conditional covariance structure of generalized compensatory multidimensional items. Psychometrika, 64(2), 129–152.
Zhang, J.,&Stout, W. F. (1999b). The theoretical detect index of dimensionality and its application to approximate simple structure. Psychometrika, 64(2), 213–249.
Zimowski, M., Muraki, E., Mislevy, R. J.,&Bock, R. D. (2007). BILOG-MG3 [computer software]. Mooresville, IN: Scientific Software International. Available from http://www.ssicen-tral.com/irt/index.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science + Business Media B.V
About this chapter
Cite this chapter
Froelich, A.G. (2009). Methods from Item Response Theory: Going Beyond Traditional Validity and Reliability in Standardizing Assessments. In: Shelley, M.C., Yore, L.D., Hand, B. (eds) Quality Research in Literacy and Science Education. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-8427-0_14
Download citation
DOI: https://doi.org/10.1007/978-1-4020-8427-0_14
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-8426-3
Online ISBN: 978-1-4020-8427-0
eBook Packages: Humanities, Social Sciences and LawEducation (R0)