Skip to main content

Methods from Item Response Theory: Going Beyond Traditional Validity and Reliability in Standardizing Assessments

  • Chapter
Book cover Quality Research in Literacy and Science Education
  • 1439 Accesses

In determining the effectiveness of educational interventions, the Gold Standard requires the use of tests and assessments of proven validity. Messick (1989) defined validity as “an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores” (p. 13). Education researchers wishing to evaluate the effectiveness of educational interventions and programs under the Gold Standard must either develop and validate their own tests and assessments or use ones developed and validated by others. As a result of the No Child Left Behind federal legislative mandate for Grades K-12 in the United States (NCLB, 2002), educational research through intervention programs that improve student learning in mathematics, reading, and science education in Grades K-12 have one natural test of interest: the standardized examination used in the state for determining student proficiency status and school and district proficiency rates. Local school personnel and state education professionals are particularly interested in research showing improvements in student performance on these high-stakes tests. Other standardized assessments that can be used to show the effectiveness of an educational program or intervention are the National Assessment of Educational Progress (NAEP, US National Center for Education Statistics, n.d.), the ACT®, (ACT, n.d.), and the SAT« (College Board, n.d.).

However, the use of state NCLB tests and these other assessments is precluded in many situations. For example, the educational program or intervention may be targeted at a subject area not covered by these assessments, such as history or study in a foreign language. Even if the subject area is in mathematics, reading, or science, the goal of the intervention may not align with the underlying curriculum and goals of the NCLB tests in the subject area. For example, programs focusing on the development of problem-solving skills in mathematics may have different goals than the curriculum tested on the NAEP or the NCLB state assessment. These assessments would not be good measures of the effectiveness of this type of intervention program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Ackerman, T. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20(4), 311–329.

    Article  Google Scholar 

  • ACT. (n.d.).Homepage. Retrieved July 11, 2008, from http://www.act.org/

  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F. M. Lord&M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479).Reading, MA: Addison-Wesley.

    Google Scholar 

  • Bock, R. D.,&Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters:Application of an EM algorithm. Psychometrika, 46(4), 443–459.

    Article  Google Scholar 

  • Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71(3), 425–440.

    Article  Google Scholar 

  • College Board. (n.d.). About the SAT. Retrieved May 15, 2008, from http://www.collegeboard.com/student/testing/sat/about.html

  • Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3),297–334.

    Article  Google Scholar 

  • Froelich, A. G. (2008). A new bias correction method for the DIMTEST procedure. Unpublished manuscript, Iowa State University at Ames.

    Google Scholar 

  • Froelich, A. G.,&Habing, B. (2008). Conditional covariance-based subtest selection for DIMTEST.Applied Psychological Measurement, 32(2), 138–155.

    Article  Google Scholar 

  • Haertel, E. H. (2006). Reliability. In R. L. Brennan (Ed.), Educational measurement (4th edn., pp.65–110). Westport, CT: American Council on Education&Praeger.

    Google Scholar 

  • Humphreys, L. C. (1985). General intelligence: An integration of factor, test, and simplex theory.In B. J. Wolman (Ed.), Handbook of intelligence: Theories, measurements, and applications (pp. 201–224). New York: John Wiley&Sons.

    Google Scholar 

  • Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th edn., pp.17–64). Westport, CT: American Council on Education&Praeger.

    Google Scholar 

  • Kim, H. R. (1994). New techniques for the dimensionality assessment of standardized test data.Unpublished doctoral dissertation, University of Illinois, Urbana-Champaign.

    Google Scholar 

  • Linden, W. J., van der,&Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer

    Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale,NJ: Lawrence Erlbaum.

    Google Scholar 

  • Lord, F. M. (1984). Standard errors of measurement at different ability levels. Journal of Educational Measurement, 21(3), 239–243.

    Article  Google Scholar 

  • Lord, F. M.,&Novick, M. R. (1968). Statistical theories of mental test scores Reading, MA:Addison-Wesley.

    Google Scholar 

  • Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd edn., pp. 13–103).New York: American Council on Education.

    Google Scholar 

  • Mislevy, R. J.,&Bock, R. D. (1984). Item operating characteristics of the Armed Services Aptitude Battery, Form 8A (Technical Report N00014-83-C-0283). Washington, DC: Office of Naval Research.

    Google Scholar 

  • Mokken, R. J. (1971). A theory and procedure of scale analysis with applications in political research. The Hague, The Netherlands: Mouton.

    Google Scholar 

  • Molenaar, I. W.,&Sijtsma, K. (2000). User's manual MSP5 for Windows. Groningen, The Netherlands: iecProGAMMA.

    Google Scholar 

  • Nandakumar, R.,&Stout, W. F. (1993). Refinements of Stout's procedure for assessing latent trai unidimensionality. Journal of Educational Statistics, 18(1), 41–68.

    Article  Google Scholar 

  • No Child Left Behind Act of 2001. Pub. L. No. 107–110, 115 Stat. 1425. (2002).

    Google Scholar 

  • Reckase, M. D. (1997). A linear logistic model for dichotomous item response data. In W. J. van der Linden&R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 271–286). New York: Springer.

    Google Scholar 

  • Roussos, L. A., Stout, W. F.,&Marden, J. I. (1998). Using new proximity measures with hierarchical cluster analysis to detect multidimensionality. Journal of Educational Measurement, 35(1), 1–30.

    Article  Google Scholar 

  • Sijtsma, K.,&Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Stout, W. F. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52(4), 589–617.

    Article  Google Scholar 

  • Stout, W. F. (1990). A new item response theory modeling approach with applications to unidi-mensionality assessment and ability estimation. Psychometrika, 55(2), 293–325.

    Article  Google Scholar 

  • Stout, W. F., Froelich, A. G.,&Gao, F. (2001). Using resampling to produce an improved DIMTEST procedure. In A. Boomsma, M. A. J. van Dujin,&T. A. B. Snijders (Eds.), Essays on item response theory (pp. 357–375). Dordrecht, The Netherlands: Springer.

    Google Scholar 

  • Stout, W. F., Habing, B., Douglas, J., Hae Rim, K., Roussos, L.,&Zhang, J. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20(4), 331–354.

    Article  Google Scholar 

  • Traub, R. E. (1994). Reliability for the social sciences: Theory and applications. Thousand Oaks, CA: Sage.

    Google Scholar 

  • United States National Center for Education Statistics. (n.d.). NAEP: The nation's report card. Retrieved July 11, 2008, from http://nces.ed.gov/nationsreportcard/

    Google Scholar 

  • Zhang, J.,&Stout, W. F. (1999a). Conditional covariance structure of generalized compensatory multidimensional items. Psychometrika, 64(2), 129–152.

    Article  Google Scholar 

  • Zhang, J.,&Stout, W. F. (1999b). The theoretical detect index of dimensionality and its application to approximate simple structure. Psychometrika, 64(2), 213–249.

    Article  Google Scholar 

  • Zimowski, M., Muraki, E., Mislevy, R. J.,&Bock, R. D. (2007). BILOG-MG3 [computer software]. Mooresville, IN: Scientific Software International. Available from http://www.ssicen-tral.com/irt/index.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amy G. Froelich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science + Business Media B.V

About this chapter

Cite this chapter

Froelich, A.G. (2009). Methods from Item Response Theory: Going Beyond Traditional Validity and Reliability in Standardizing Assessments. In: Shelley, M.C., Yore, L.D., Hand, B. (eds) Quality Research in Literacy and Science Education. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-8427-0_14

Download citation

Publish with us

Policies and ethics