Abstract
In this chapter, focuses on standard errors of equating; both bootstrap and analytic procedures are described. We illustrate the use of standard errors to choose sample sizes for equating and to compare the precision in estimating equating relationships for different designs and methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
ACT. (2007). The ACT technical manual. Iowa City, IA: Author.
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508–600). Washington, DC: American Council on Education.
Baker, F. B. (1996). An investigation of the sampling distributions of equating coefficients. Applied Psychological Measurement, 20, 45–57.
Baker, F. B. (1997). Empirical sampling distributions of equating coefficients for graded and nominal response instruments. Applied Psychological Measurement, 21, 157–172.
Baldwin, P. (2011). A strategy for developing a common metric in item response theory when parameter posterior distributions are known. Journal of Educational Measurement, 48, 1–11.
Braun, H. I., & Holland, P. W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 9–49). New York: Academic.
Brennan, R. L., Wang, T., Kim, S., & Seol, J. (2009). Equating recipes. Iowa City, IA: Center for Advanced Studies in Measurement and Assessment, University of Iowa.
Crouse, J. D. (1991). Comparing the equating accuracy from three data collection designs using bootstrap estimation methods. Unpublished doctoral dissertation, The University of Iowa, Iowa City, IA.
Cui, Z., & Kolen, M. J. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32, 334–347.
Efron, B. (1982). The jackknife, the bootstrap, and other resampling plans. Philadelphia, PA: Society for Industrial and Applied Mathematics.
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap (Monographs on Statistics and Applied Probability 57). New York: Chapman & Hall.
Guo, H. (2010). Accumulative equating error after a chain of linear equatings. Psychometrika, 75, 438–453.
Haberman, S. J., Lee, Y., & Qian, J. (2009). Jackknifing techniques for evaluation of equating accuracy, (Research Report 09–39). Princeton, NJ: Educational Testing Service.
Hagge, S. L., & Kolen, M. J. (2011). Equating mixed-format tests with format representative and non-representative common items. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (volume 1) (CASMA Monograph Number 2.1) (pp. 95–135). Iowa City, IA: CASMA, The University of Iowa.
Hagge, S. L., Liu, C., He, Y., Powers, S. J., Wang, W., & Kolen, M. J. (2011). A comparison of IRT and traditional equating methods in mixed-format equating. In M.J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (volume 1) (CASMA Monograph Number 2.1) (pp. 19–50). Iowa City, IA: CASMA, The University of Iowa.
Hanson, B. A., Zeng, L., & Kolen, M. J. (1993). Standard errors of Levine linear equating. Applied Psychological Measurement, 17, 225–237.
Holland, P. W., King, B. F., & Thayer, D. T. (1989). The standard error of equating for the kernel method of equating score distributions (Technical Report 89–83). Princeton, NJ: Educational Testing Service.
Jarjoura, D., & Kolen, M. J. (1985). Standard errors of equipercentile equating for the common item nonequivalent populations design. Journal of Educational Statistics, 10, 143–160.
Kendall, M., & Stuart, A. (1977). The advanced theory of statistics (4th ed., Vol. 1). New York: Macmillan.
Kolen, M. J. (1985). Standard errors of Tucker equating. Applied Psychological Measurement, 9, 209–223.
Li, D., Jiang, Y., & von Davier, A. A. (2012). The accuracy and consistency of a series of IRT true score equatings. Journal of Educational Measurement, 49, 167–189.
Li, D., Li, S., & von Davier, A. A. (2011). Applying time-series analysis to detect scale drift. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 327–346). New York: Springer.
Liou, M., & Cheng, P. E. (1995). Asymptotic standard error of equipercentile equating. Journal of Educational and Behavioral Statistics, 20, 259–286.
Liou, M., Cheng, P. E., & Johnson, E. G. (1997). Standard errors of the kernel equating methods under the common-item design. Applied Psychological Measurement, 21, 349–369.
Liu, C., & Kolen, M. J. (2011). A comparison among IRT equating methods and traditional equating methods for mixed-format tests. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (volume 1) (CASMA Monograph Number 2.1) (pp. 75–94). Iowa City, IA: CASMA, The University of Iowa.
Liu, Y., Schulz, E. M., & Yu, L. (2007). Standard error estimation of 3PL IRT true score equating with an MCMC method. Journal of Educational and Behavioral Statistics, 33, 257–278.
Lord, F. M. (1950). Notes on comparable scales for test scores (Research Bulletin 5048). Princeton, NJ: Educational Testing Service.
Lord, F. M. (1975). Automated hypothesis tests and standard errors for nonstandard problems. The American Statistician, 29, 56–59.
Lord, F. M. (1982a). The standard error of equipercentile equating. Journal of Educational Statistics, 7, 165–174.
Lord, F. M. (1982b). Standard error of an equating by item response theory. Applied Psychological Measurement, 6, 463–471.
Moses, T., & Zhang, W. (2011). Standard errors of equating differences. Journal of Educational and Behavioral Statistics, 36, 779–803.
Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review, Otaru University of Commerce, 51(1), 1–23.
Ogasawara, H. (2001a). Item response theory true score equatings and their standard errors. Journal of Educational and Behavioral Statistics, 26, 31–50.
Ogasawara, H. (2001b). Least squares estimation of item response theory linking coefficients. Applied Psychological Measurement, 25, 3–24.
Ogasawara, H. (2001c). Marginal maximum likelihood estimation of item response theory (IRT) equating coefficients for the common-examinee design. Japanese Psychological Research, 43, 72–82.
Ogasawara, H. (2001d). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 53–67.
Ogasawara, H. (2003a). Asymptotic standard errors of IRT observed-score equating methods. Psychometrika, 68, 193–211.
Ogasawara, H. (2003b, May). EL 1.0. Unpublished computer subroutines. (http://www.res.otaru-uc.ac.jp/%7Ehogasa/)
Ogasawara, H. (2011). Applications of asymptotic expansion in item response theory linking. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 261–280). New York: Springer.
Parshall, C. G., Houghton, P. D., & Kromrey, J. D. (1995). Equating error and statistical bias in small sample linear equating. Journal of Educational Measurement, 32, 37–54.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221–262). New York: Macmillan.
Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (1989). Numerical recipes. The art of scientific computing. (Fortran version). Cambridge, UK: Cambridge University Press.
Rijmen, F., Qu, Y., & von Davier, A. A. (2011). Hypothesis testing of equating differences in the kernel equating framework. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 317–326). New York: Springer.
Tsai, T.-H., Hanson, B. A., Kolen, M. J., & Forsyth, R. A. (2001). A comparison of bootstrap standard errors of IRT equating methods for the common-item nonequivalent groups design. Applied Measurement in Education, 14, 17–30.
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.
Wang, T. (2009). Standard errors of equating for the percentile rank-based equipercentile equating with log-linear presmoothing. Journal of Educational and Behavioral Statistics, 34, 7–23.
Zeng, L. (1993). A numerical approach for computing standard errors of linear equating. Applied Psychological Measurement, 17, 177–186.
Zeng, L., & Cope, R. T. (1995). Standard errors of linear equating for the counterbalanced design. Journal of Educational and Behavioral Statistics, 4, 337–348.
Zeng, L., Hanson, B. A., & Kolen, M. J. (1994). Standard errors of a chain of linear equatings. Applied Psychological Measurement, 18, 369–378.
Zu, J., & Yuan, K.-H. (2012). Standard error of linear observed-score equating for the NEAT design with nonnormally distributed data. Journal of Educational Measurement, 49, 190–213.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Kolen, M.J., Brennan, R.L. (2014). Standard Errors of Equating. In: Test Equating, Scaling, and Linking. Statistics for Social and Behavioral Sciences. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0317-7_7
Download citation
DOI: https://doi.org/10.1007/978-1-4939-0317-7_7
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0316-0
Online ISBN: 978-1-4939-0317-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)