Penalized Best Linear Prediction of True Test Scores

Yao, Lili; Haberman, Shelby J.; Zhang, Mo

doi:10.1007/s11336-018-9636-7

Penalized Best Linear Prediction of True Test Scores

Published: 21 September 2018

Volume 84, pages 186–211, (2019)
Cite this article

Psychometrika Aims and scope Submit manuscript

502 Accesses
9 Citations
Explore all metrics

Abstract

In best linear prediction (BLP), a true test score is predicted by observed item scores and by ancillary test data. If the use of BLP rather than a more direct estimate of a true score has disparate impact for different demographic groups, then a fairness issue arises. To improve population invariance but to preserve much of the efficiency of BLP, a modified approach, penalized best linear prediction, is proposed that weights both mean square error of prediction and a quadratic measure of subgroup biases. The proposed methodology is applied to three high-stakes writing assessments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-Stakes Testing Case Study: A Latent Variable Approach for Assessing Measurement and Prediction Invariance

Article 22 January 2019

Power Analysis for the Wald, LR, Score, and Gradient Tests in a Marginal Maximum Likelihood Framework: Applications in IRT

Article Open access 27 August 2022

To a or not to a: On the Use of the Total Score

References

Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater V.2. Journal of Technology, Learning and Assessment, 4(3), 1–29.
Google Scholar
Attali, Y., Burstein, J., & Andreyev, S. (2003). E-rater Version 2.0: Combining writing analysis feedback with automated essay scoring. Princeton, NJ: Educational Testing Service.
Google Scholar
Burstein, J., Chodorow, M., & Leacock, C. (2004). Automated essay evaluation: The Criterion online writing service. AI Magazine, 25(3), 27–36.
Google Scholar
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel–Haenszel and standardization. Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Dorans, N. J., & Holland, P. W. (2000). Population invariance and the equitability of tests: Basic theory and the linear case. Journal of Educational Measurement, 37, 281–306. https://doi.org/10.1111/j.1745-3984.2000.tb01088.x.
Article Google Scholar
Haberman, S. J. (1984). Adjustment by minimum discriminant information. The Annals of Statistics, 12, 971–988. https://doi.org/10.1214/aos/1176346715.
Article Google Scholar
Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33, 204–229. https://doi.org/10.3102/1076998607302636.
Article Google Scholar
Haberman, S. J., & Qian, J. (2007). Linear prediction of a true score from a direct estimate and several derived estimates. Journal of Educational and Behavioral Statistics, 32, 6–23. https://doi.org/10.3102/1076998606298036.
Article Google Scholar
Haberman, S. J., & Sinharay, S. (2010a). The application of the cumulative logistic regression model to automated essay scoring. Journal of Educational and Behavioral Statistics, 35, 586–602. https://doi.org/10.3102/1076998610375839.
Article Google Scholar
Haberman, S. J., & Sinharay, S. (2010b). Reporting of subscores using multidimensional item response theory. Psychometrika, 75, 209–227. https://doi.org/10.1007/S11336-010-9158-4.
Article Google Scholar
Haberman, S. J. & Sinharay, S. (2011). How does the knowledge of subgroup membership of examinees affect the prediction of true subscores? Research Report No. RR-11-43. Princeton, NJ, Educational Testing Service. https://doi.org/10.1002/j.2333-8504.2011.tb02279.x
Haberman, S. J., & Sinharay, S. (2013). Does subgroup membership information lead to better estimation of true subscores? British Journal of Mathematical and Statistical Psychology, 66, 452–469.
PubMed Google Scholar
Haberman, S. J., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62, 79–95. https://doi.org/10.1348/000711007x248875.
Article PubMed Google Scholar
Haberman, S. J., & Yao, L. (2015). Repeater analysis for combining information from different assessments. Journal of Educational Measurement, 52, 223–251. https://doi.org/10.1111/jedm.12075.
Article Google Scholar
Haberman, S. J., Yao, L., & Sinharay, S. (2015). Prediction of true test scores from observed item scores and ancillary data. British Journal of Mathematical and Statistical Psychology, 68, 363–385. https://doi.org/10.1111/bmsp.12052.
Article PubMed Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison Wesley.
Google Scholar
Sinharay, S., Haberman, S. J., & Puhan, G. (2007). Subscores based on classical test theory: To report or not to report. Educational Measurement: Issues and Practice, 26, 421–28. https://doi.org/10.1111/j.1745-3992.2007.00105.x.
Article Google Scholar
Wainer, H., Sheehan, K., & Wang, X. (2000). Some paths toward making Praxis scores more useful. Journal of Educational Measurement, 37, 113–140. https://doi.org/10.1111/j.1745-3984.2000.tb01079.x.
Article Google Scholar
Wainer, H., Vevea, J. L., Camacho, F., Reeve, B. B., Swygert, K. A., & Thissen, D. (2001). Augmented scores-"Borrowing strength" to compute scores based on small numbers of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 343–387). Mahwah, NJ: Erlbaum.
Google Scholar

Download references

Acknowledgements

Lili Yao was partially supported by the National Natural Science Foundation of China (61863012, 61263010) and partially by the Research Project of Science and Technology Department of Jiangxi Province, China (20181BBE50020, 20161BBE50082, 20161BAB202067).

Author information

Authors and Affiliations

Educational Testing Service, 660 Rosedale Road, Princeton, NJ, 08540, USA
Lili Yao & Mo Zhang
Edusoft, Barak 3, Apt. 1, 9350276, Jerusalem, Israel
Shelby J. Haberman

Authors

Lili Yao
View author publications
You can also search for this author in PubMed Google Scholar
Shelby J. Haberman
View author publications
You can also search for this author in PubMed Google Scholar
Mo Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lili Yao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yao, L., Haberman, S.J. & Zhang, M. Penalized Best Linear Prediction of True Test Scores. Psychometrika 84, 186–211 (2019). https://doi.org/10.1007/s11336-018-9636-7

Download citation

Received: 01 September 2017
Revised: 14 August 2018
Published: 21 September 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s11336-018-9636-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Penalized Best Linear Prediction of True Test Scores

Abstract

Access this article

Similar content being viewed by others

High-Stakes Testing Case Study: A Latent Variable Approach for Assessing Measurement and Prediction Invariance

Power Analysis for the Wald, LR, Score, and Gradient Tests in a Marginal Maximum Likelihood Framework: Applications in IRT

To a or not to a: On the Use of the Total Score

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Penalized Best Linear Prediction of True Test Scores

Abstract

Access this article

Similar content being viewed by others

High-Stakes Testing Case Study: A Latent Variable Approach for Assessing Measurement and Prediction Invariance

Power Analysis for the Wald, LR, Score, and Gradient Tests in a Marginal Maximum Likelihood Framework: Applications in IRT

To a or not to a: On the Use of the Total Score

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation