Skip to main content

Standard Errors of Equating

  • Chapter
  • First Online:
Book cover Test Equating, Scaling, and Linking

Part of the book series: Statistics for Social and Behavioral Sciences ((SSBS))

  • 4198 Accesses

Abstract

In this chapter, focuses on standard errors of equating; both bootstrap and analytic procedures are described. We illustrate the use of standard errors to choose sample sizes for equating and to compare the precision in estimating equating relationships for different designs and methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • ACT. (2007). The ACT technical manual. Iowa City, IA: Author.

    Google Scholar 

  • Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508–600). Washington, DC: American Council on Education.

    Google Scholar 

  • Baker, F. B. (1996). An investigation of the sampling distributions of equating coefficients. Applied Psychological Measurement, 20, 45–57.

    Article  Google Scholar 

  • Baker, F. B. (1997). Empirical sampling distributions of equating coefficients for graded and nominal response instruments. Applied Psychological Measurement, 21, 157–172.

    Article  Google Scholar 

  • Baldwin, P. (2011). A strategy for developing a common metric in item response theory when parameter posterior distributions are known. Journal of Educational Measurement, 48, 1–11.

    Article  Google Scholar 

  • Braun, H. I., & Holland, P. W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 9–49). New York: Academic.

    Google Scholar 

  • Brennan, R. L., Wang, T., Kim, S., & Seol, J. (2009). Equating recipes. Iowa City, IA: Center for Advanced Studies in Measurement and Assessment, University of Iowa.

    Google Scholar 

  • Crouse, J. D. (1991). Comparing the equating accuracy from three data collection designs using bootstrap estimation methods. Unpublished doctoral dissertation, The University of Iowa, Iowa City, IA.

    Google Scholar 

  • Cui, Z., & Kolen, M. J. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32, 334–347.

    Article  MathSciNet  Google Scholar 

  • Efron, B. (1982). The jackknife, the bootstrap, and other resampling plans. Philadelphia, PA: Society for Industrial and Applied Mathematics.

    Google Scholar 

  • Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap (Monographs on Statistics and Applied Probability 57). New York: Chapman & Hall.

    Google Scholar 

  • Guo, H. (2010). Accumulative equating error after a chain of linear equatings. Psychometrika, 75, 438–453.

    Article  MATH  MathSciNet  Google Scholar 

  • Haberman, S. J., Lee, Y., & Qian, J. (2009). Jackknifing techniques for evaluation of equating accuracy, (Research Report 09–39). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Hagge, S. L., & Kolen, M. J. (2011). Equating mixed-format tests with format representative and non-representative common items. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (volume 1) (CASMA Monograph Number 2.1) (pp. 95–135). Iowa City, IA: CASMA, The University of Iowa.

    Google Scholar 

  • Hagge, S. L., Liu, C., He, Y., Powers, S. J., Wang, W., & Kolen, M. J. (2011). A comparison of IRT and traditional equating methods in mixed-format equating. In M.J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (volume 1) (CASMA Monograph Number 2.1) (pp. 19–50). Iowa City, IA: CASMA, The University of Iowa.

    Google Scholar 

  • Hanson, B. A., Zeng, L., & Kolen, M. J. (1993). Standard errors of Levine linear equating. Applied Psychological Measurement, 17, 225–237.

    Article  Google Scholar 

  • Holland, P. W., King, B. F., & Thayer, D. T. (1989). The standard error of equating for the kernel method of equating score distributions (Technical Report 89–83). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Jarjoura, D., & Kolen, M. J. (1985). Standard errors of equipercentile equating for the common item nonequivalent populations design. Journal of Educational Statistics, 10, 143–160.

    Article  Google Scholar 

  • Kendall, M., & Stuart, A. (1977). The advanced theory of statistics (4th ed., Vol. 1). New York: Macmillan.

    Google Scholar 

  • Kolen, M. J. (1985). Standard errors of Tucker equating. Applied Psychological Measurement, 9, 209–223.

    Article  Google Scholar 

  • Li, D., Jiang, Y., & von Davier, A. A. (2012). The accuracy and consistency of a series of IRT true score equatings. Journal of Educational Measurement, 49, 167–189.

    Article  MATH  Google Scholar 

  • Li, D., Li, S., & von Davier, A. A. (2011). Applying time-series analysis to detect scale drift. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 327–346). New York: Springer.

    Google Scholar 

  • Liou, M., & Cheng, P. E. (1995). Asymptotic standard error of equipercentile equating. Journal of Educational and Behavioral Statistics, 20, 259–286.

    Google Scholar 

  • Liou, M., Cheng, P. E., & Johnson, E. G. (1997). Standard errors of the kernel equating methods under the common-item design. Applied Psychological Measurement, 21, 349–369.

    Article  Google Scholar 

  • Liu, C., & Kolen, M. J. (2011). A comparison among IRT equating methods and traditional equating methods for mixed-format tests. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (volume 1) (CASMA Monograph Number 2.1) (pp. 75–94). Iowa City, IA: CASMA, The University of Iowa.

    Google Scholar 

  • Liu, Y., Schulz, E. M., & Yu, L. (2007). Standard error estimation of 3PL IRT true score equating with an MCMC method. Journal of Educational and Behavioral Statistics, 33, 257–278.

    Article  Google Scholar 

  • Lord, F. M. (1950). Notes on comparable scales for test scores (Research Bulletin 5048). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Lord, F. M. (1975). Automated hypothesis tests and standard errors for nonstandard problems. The American Statistician, 29, 56–59.

    MATH  Google Scholar 

  • Lord, F. M. (1982a). The standard error of equipercentile equating. Journal of Educational Statistics, 7, 165–174.

    Article  Google Scholar 

  • Lord, F. M. (1982b). Standard error of an equating by item response theory. Applied Psychological Measurement, 6, 463–471.

    Article  Google Scholar 

  • Moses, T., & Zhang, W. (2011). Standard errors of equating differences. Journal of Educational and Behavioral Statistics, 36, 779–803.

    Article  Google Scholar 

  • Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review, Otaru University of Commerce, 51(1), 1–23.

    Google Scholar 

  • Ogasawara, H. (2001a). Item response theory true score equatings and their standard errors. Journal of Educational and Behavioral Statistics, 26, 31–50.

    Article  Google Scholar 

  • Ogasawara, H. (2001b). Least squares estimation of item response theory linking coefficients. Applied Psychological Measurement, 25, 3–24.

    Article  MathSciNet  Google Scholar 

  • Ogasawara, H. (2001c). Marginal maximum likelihood estimation of item response theory (IRT) equating coefficients for the common-examinee design. Japanese Psychological Research, 43, 72–82.

    Article  Google Scholar 

  • Ogasawara, H. (2001d). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 53–67.

    Article  MathSciNet  Google Scholar 

  • Ogasawara, H. (2003a). Asymptotic standard errors of IRT observed-score equating methods. Psychometrika, 68, 193–211.

    Article  MathSciNet  Google Scholar 

  • Ogasawara, H. (2003b, May). EL 1.0. Unpublished computer subroutines. (http://www.res.otaru-uc.ac.jp/%7Ehogasa/)

  • Ogasawara, H. (2011). Applications of asymptotic expansion in item response theory linking. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 261–280). New York: Springer.

    Google Scholar 

  • Parshall, C. G., Houghton, P. D., & Kromrey, J. D. (1995). Equating error and statistical bias in small sample linear equating. Journal of Educational Measurement, 32, 37–54.

    Article  Google Scholar 

  • Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221–262). New York: Macmillan.

    Google Scholar 

  • Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (1989). Numerical recipes. The art of scientific computing. (Fortran version). Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Rijmen, F., Qu, Y., & von Davier, A. A. (2011). Hypothesis testing of equating differences in the kernel equating framework. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 317–326). New York: Springer.

    Google Scholar 

  • Tsai, T.-H., Hanson, B. A., Kolen, M. J., & Forsyth, R. A. (2001). A comparison of bootstrap standard errors of IRT equating methods for the common-item nonequivalent groups design. Applied Measurement in Education, 14, 17–30.

    Article  Google Scholar 

  • von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.

    Google Scholar 

  • Wang, T. (2009). Standard errors of equating for the percentile rank-based equipercentile equating with log-linear presmoothing. Journal of Educational and Behavioral Statistics, 34, 7–23.

    Article  Google Scholar 

  • Zeng, L. (1993). A numerical approach for computing standard errors of linear equating. Applied Psychological Measurement, 17, 177–186.

    Article  Google Scholar 

  • Zeng, L., & Cope, R. T. (1995). Standard errors of linear equating for the counterbalanced design. Journal of Educational and Behavioral Statistics, 4, 337–348.

    Google Scholar 

  • Zeng, L., Hanson, B. A., & Kolen, M. J. (1994). Standard errors of a chain of linear equatings. Applied Psychological Measurement, 18, 369–378.

    Article  Google Scholar 

  • Zu, J., & Yuan, K.-H. (2012). Standard error of linear observed-score equating for the NEAT design with nonnormally distributed data. Journal of Educational Measurement, 49, 190–213.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael J. Kolen .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kolen, M.J., Brennan, R.L. (2014). Standard Errors of Equating. In: Test Equating, Scaling, and Linking. Statistics for Social and Behavioral Sciences. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0317-7_7

Download citation

Publish with us

Policies and ethics