Abstract
When tests for different populations are compared, vertical item response theory (IRT) linking procedures can be used. However, the validity of the linking might be compromised when items in the procedure show differential item functioning (DIF), violating the assumption of the procedure that the item parameters are stable in different populations. This article presents a procedure that is robust against DIF but also exploits the advantages of IRT linking. This procedure, called comparisons using reference sets, is a variation of the scaling test design. Using reference sets, an anchor test is administered in all populations of interest. Subsequently, different IRT scales are estimated for each population separately. To link an operational test to the reference sets, a sample of the items from the reference set is administered with the operational test. In this article, a simulation study is presented to compare a linking method using reference sets with a linking method using a direct anchor. From the simulation study, we can conclude that the procedure using reference sets has an advantage over other vertical linking procedures.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abramowitz M, Stegun IA (1972) Handbook of mathematical functions. Dover Publications, New York
Béguin AA (2000) Robustness of equating high-stakes tests. Doctoral thesis, University of Twente, Enschede
Carlton JE (2011) Statistical models for vertical linking. In: von Davier AA (ed) Statistical models for test equating, scaling, and linking. Springer, New York, pp 59–70
Hanson BA, Bguin, AA (2002) Obtaining a common scale for item response theory parameters using separate versus concurrent estimation in the common-item equating design. Appl Psychol Meas 26:3--14
Harris DJ (2007) Practical issues in vertical scaling. In: Dorans NJ, Pommerich M, Holland PW (eds) Linking and aligning scores and scales. Springer, New York, pp 233–252
Holland PW, Dorans NJ (2006) Linking and equating. In: Brennan RL (ed) Educational measurement, 4th edn. Praeger, Westport, pp 189–220
Kolen MJ (2006) Scaling and norming. In: Brennan RL (ed) Educational measurement, 4th edn. Praeger, Westport, pp 155–186
Kolen MJ, Brennan RL (2004) Test equating, 2nd edn. Springer, New York
Lord FM, Wingersky MS (1984) Comparison of IRT true-score and equipercentile observed-score “equatings”. Appl Psychol Meas 8:453–461
Ofqual (2011) A Review of the Pilot of the Single Level Test Approach (Ofqual/11/4837). Author, Coventry, UK. Retrieved from: http://dera.ioe.ac.uk/2577/1/2011-04-13-review-of-pilot-single-level-test-approach.pdf
Petersen NS, Kolen MJ, Hoover HD (1989) Scaling, norming and equating. In: Linn RL (ed) Educational measurement, 3rd edn. American Council on Education and Macmillan, New York, pp 221–262
Scheerens J, Ehren M, Sleegers P, De Leeuw R (2012) OECD review on evaluation and assessment frameworks for improving school outcomes. Country background report for the Netherlands. OECD, Brussels. Retrieved from: http://www.oecd.org/edu/school/NLD_CBR_Evaluation_and_Assessment.pdf
Verhelst ND, Glas CAW (1995) The one parameter logistic model. In: Fischer GH, Molenaar IW (eds) Rasch models: foundations, recent developments, and applications. Springer, New York, pp 215–238
Verhelst ND, Glas CAW, Verstralen HHFM (1994) OPLM: computer program and manual. [Computer Program]. Cito, Arnhem
Von Davier M, Von Davier AA (2004) A unified approach to IRT scale linking and scale transformations (ETS Research Reports RR-04-09). ETS, Princeton
Von Davier M, Von Davier AA (2012) A general model for IRT scale linking and scale transformations. In: von Davier AA (ed) Statistical models for test equating, scaling, and linking. Springer, New York, pp 225–242
Zeng L, Kolen MJ (1995) An alternative approach for IRT observed-score equating of number-correct scores. Appl Psychol Meas 19:231–240
Zimowski MF, Muraki E, Mislevy RJ, Bock RD (1996) BILOG-MG: multiple-group IRT analysis and test maintenance for binary items. [Computer Program]. Scientific Software International, Inc., Chicago
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Béguin, A.A., Wools, S. (2015). Vertical Comparison Using Reference Sets. In: Millsap, R., Bolt, D., van der Ark, L., Wang, WC. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 89. Springer, Cham. https://doi.org/10.1007/978-3-319-07503-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-07503-7_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07502-0
Online ISBN: 978-3-319-07503-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)