Standard Errors of Equating

Kolen, Michael J.; Brennan, Robert L.

doi:10.1007/978-1-4939-0317-7_7

Michael J. Kolen⁵ &
Robert L. Brennan⁶

Part of the book series: Statistics for Social and Behavioral Sciences ((SSBS))

4198 Accesses

Abstract

In this chapter, focuses on standard errors of equating; both bootstrap and analytic procedures are described. We illustrate the use of standard errors to choose sample sizes for equating and to compare the precision in estimating equating relationships for different designs and methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

ACT. (2007). The ACT technical manual. Iowa City, IA: Author.
Google Scholar
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508–600). Washington, DC: American Council on Education.
Google Scholar
Baker, F. B. (1996). An investigation of the sampling distributions of equating coefficients. Applied Psychological Measurement, 20, 45–57.
Article Google Scholar
Baker, F. B. (1997). Empirical sampling distributions of equating coefficients for graded and nominal response instruments. Applied Psychological Measurement, 21, 157–172.
Article Google Scholar
Baldwin, P. (2011). A strategy for developing a common metric in item response theory when parameter posterior distributions are known. Journal of Educational Measurement, 48, 1–11.
Article Google Scholar
Braun, H. I., & Holland, P. W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 9–49). New York: Academic.
Google Scholar
Brennan, R. L., Wang, T., Kim, S., & Seol, J. (2009). Equating recipes. Iowa City, IA: Center for Advanced Studies in Measurement and Assessment, University of Iowa.
Google Scholar
Crouse, J. D. (1991). Comparing the equating accuracy from three data collection designs using bootstrap estimation methods. Unpublished doctoral dissertation, The University of Iowa, Iowa City, IA.
Google Scholar
Cui, Z., & Kolen, M. J. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32, 334–347.
Article MathSciNet Google Scholar
Efron, B. (1982). The jackknife, the bootstrap, and other resampling plans. Philadelphia, PA: Society for Industrial and Applied Mathematics.
Google Scholar
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap (Monographs on Statistics and Applied Probability 57). New York: Chapman & Hall.
Google Scholar
Guo, H. (2010). Accumulative equating error after a chain of linear equatings. Psychometrika, 75, 438–453.
Article MATH MathSciNet Google Scholar
Haberman, S. J., Lee, Y., & Qian, J. (2009). Jackknifing techniques for evaluation of equating accuracy, (Research Report 09–39). Princeton, NJ: Educational Testing Service.
Google Scholar
Hagge, S. L., & Kolen, M. J. (2011). Equating mixed-format tests with format representative and non-representative common items. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (volume 1) (CASMA Monograph Number 2.1) (pp. 95–135). Iowa City, IA: CASMA, The University of Iowa.
Google Scholar
Hagge, S. L., Liu, C., He, Y., Powers, S. J., Wang, W., & Kolen, M. J. (2011). A comparison of IRT and traditional equating methods in mixed-format equating. In M.J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (volume 1) (CASMA Monograph Number 2.1) (pp. 19–50). Iowa City, IA: CASMA, The University of Iowa.
Google Scholar
Hanson, B. A., Zeng, L., & Kolen, M. J. (1993). Standard errors of Levine linear equating. Applied Psychological Measurement, 17, 225–237.
Article Google Scholar
Holland, P. W., King, B. F., & Thayer, D. T. (1989). The standard error of equating for the kernel method of equating score distributions (Technical Report 89–83). Princeton, NJ: Educational Testing Service.
Google Scholar
Jarjoura, D., & Kolen, M. J. (1985). Standard errors of equipercentile equating for the common item nonequivalent populations design. Journal of Educational Statistics, 10, 143–160.
Article Google Scholar
Kendall, M., & Stuart, A. (1977). The advanced theory of statistics (4th ed., Vol. 1). New York: Macmillan.
Google Scholar
Kolen, M. J. (1985). Standard errors of Tucker equating. Applied Psychological Measurement, 9, 209–223.
Article Google Scholar
Li, D., Jiang, Y., & von Davier, A. A. (2012). The accuracy and consistency of a series of IRT true score equatings. Journal of Educational Measurement, 49, 167–189.
Article MATH Google Scholar
Li, D., Li, S., & von Davier, A. A. (2011). Applying time-series analysis to detect scale drift. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 327–346). New York: Springer.
Google Scholar
Liou, M., & Cheng, P. E. (1995). Asymptotic standard error of equipercentile equating. Journal of Educational and Behavioral Statistics, 20, 259–286.
Google Scholar
Liou, M., Cheng, P. E., & Johnson, E. G. (1997). Standard errors of the kernel equating methods under the common-item design. Applied Psychological Measurement, 21, 349–369.
Article Google Scholar
Liu, C., & Kolen, M. J. (2011). A comparison among IRT equating methods and traditional equating methods for mixed-format tests. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (volume 1) (CASMA Monograph Number 2.1) (pp. 75–94). Iowa City, IA: CASMA, The University of Iowa.
Google Scholar
Liu, Y., Schulz, E. M., & Yu, L. (2007). Standard error estimation of 3PL IRT true score equating with an MCMC method. Journal of Educational and Behavioral Statistics, 33, 257–278.
Article Google Scholar
Lord, F. M. (1950). Notes on comparable scales for test scores (Research Bulletin 5048). Princeton, NJ: Educational Testing Service.
Google Scholar
Lord, F. M. (1975). Automated hypothesis tests and standard errors for nonstandard problems. The American Statistician, 29, 56–59.
MATH Google Scholar
Lord, F. M. (1982a). The standard error of equipercentile equating. Journal of Educational Statistics, 7, 165–174.
Article Google Scholar
Lord, F. M. (1982b). Standard error of an equating by item response theory. Applied Psychological Measurement, 6, 463–471.
Article Google Scholar
Moses, T., & Zhang, W. (2011). Standard errors of equating differences. Journal of Educational and Behavioral Statistics, 36, 779–803.
Article Google Scholar
Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review, Otaru University of Commerce, 51(1), 1–23.
Google Scholar
Ogasawara, H. (2001a). Item response theory true score equatings and their standard errors. Journal of Educational and Behavioral Statistics, 26, 31–50.
Article Google Scholar
Ogasawara, H. (2001b). Least squares estimation of item response theory linking coefficients. Applied Psychological Measurement, 25, 3–24.
Article MathSciNet Google Scholar
Ogasawara, H. (2001c). Marginal maximum likelihood estimation of item response theory (IRT) equating coefficients for the common-examinee design. Japanese Psychological Research, 43, 72–82.
Article Google Scholar
Ogasawara, H. (2001d). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 53–67.
Article MathSciNet Google Scholar
Ogasawara, H. (2003a). Asymptotic standard errors of IRT observed-score equating methods. Psychometrika, 68, 193–211.
Article MathSciNet Google Scholar
Ogasawara, H. (2003b, May). EL 1.0. Unpublished computer subroutines. (http://www.res.otaru-uc.ac.jp/%7Ehogasa/)
Ogasawara, H. (2011). Applications of asymptotic expansion in item response theory linking. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 261–280). New York: Springer.
Google Scholar
Parshall, C. G., Houghton, P. D., & Kromrey, J. D. (1995). Equating error and statistical bias in small sample linear equating. Journal of Educational Measurement, 32, 37–54.
Article Google Scholar
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221–262). New York: Macmillan.
Google Scholar
Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (1989). Numerical recipes. The art of scientific computing. (Fortran version). Cambridge, UK: Cambridge University Press.
Google Scholar
Rijmen, F., Qu, Y., & von Davier, A. A. (2011). Hypothesis testing of equating differences in the kernel equating framework. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 317–326). New York: Springer.
Google Scholar
Tsai, T.-H., Hanson, B. A., Kolen, M. J., & Forsyth, R. A. (2001). A comparison of bootstrap standard errors of IRT equating methods for the common-item nonequivalent groups design. Applied Measurement in Education, 14, 17–30.
Article Google Scholar
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.
Google Scholar
Wang, T. (2009). Standard errors of equating for the percentile rank-based equipercentile equating with log-linear presmoothing. Journal of Educational and Behavioral Statistics, 34, 7–23.
Article Google Scholar
Zeng, L. (1993). A numerical approach for computing standard errors of linear equating. Applied Psychological Measurement, 17, 177–186.
Article Google Scholar
Zeng, L., & Cope, R. T. (1995). Standard errors of linear equating for the counterbalanced design. Journal of Educational and Behavioral Statistics, 4, 337–348.
Google Scholar
Zeng, L., Hanson, B. A., & Kolen, M. J. (1994). Standard errors of a chain of linear equatings. Applied Psychological Measurement, 18, 369–378.
Article Google Scholar
Zu, J., & Yuan, K.-H. (2012). Standard error of linear observed-score equating for the NEAT design with nonnormally distributed data. Journal of Educational Measurement, 49, 190–213.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Iowa Testing Programs, University of Iowa, Iowa City, IA, USA
Michael J. Kolen
CASMA, University of Iowa, Iowa City, IA, USA
Robert L. Brennan

Authors

Michael J. Kolen
View author publications
You can also search for this author in PubMed Google Scholar
Robert L. Brennan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael J. Kolen .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kolen, M.J., Brennan, R.L. (2014). Standard Errors of Equating. In: Test Equating, Scaling, and Linking. Statistics for Social and Behavioral Sciences. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0317-7_7

Download citation

DOI: https://doi.org/10.1007/978-1-4939-0317-7_7
Published: 14 January 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0316-0
Online ISBN: 978-1-4939-0317-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics