Advertisement

Selecting a Data Collection Design for Linking in Educational Measurement: Taking Differential Motivation into Account

  • Marie-Anne MittelhaëuserEmail author
  • Anton A. Béguin
  • Klaas Sijtsma
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 89)

Abstract

In educational measurement, multiple test forms are often constructed to measure the same construct. Linking procedures can be used to disentangle differences in test form difficulty and differences in the proficiency of examinees so that scores for different test forms can be used interchangeably. Multiple data collection designs can be used for collecting data to be used for linking. Differential motivation refers to the difference in test-taking motivation that exists between high-stakes and low-stakes administration conditions. In a high-stakes administration condition, an examinee is expected to work harder and strive for maximum performance, whereas a low-stakes administration condition elicits typical, rather than maximum, performance. Differential motivation can be considered a confounding variable when choosing a data collection design. We discuss the suitability of different data collection designs and the way they are typically implemented in practice with respect to the effect of differential motivation. An example using data from the Eindtoets Basisonderwijs (End of Primary School Test) highlights the need to consider differential motivation.

Keywords

Data collection design Differential motivation Linking 

References

  1. Angoff WH (1971) Scales, norms, and equivalent scores. In: Thorndike RL (ed) Educational measurement, 2nd edn. American Council of Education, Washington, pp 508–600Google Scholar
  2. Béguin AA (2000) Robustness of equating high-stakes tests. Unpublished doctoral dissertation, Twente University, Enschede, The NetherlandsGoogle Scholar
  3. Béguin AA, Maan A (2007) IRT linking of high-stakes tests with a low-stakes anchor. Paper presented at the 2007 Annual National Council of Measurement in Education (NCME) meeting, April 10–12, ChicagoGoogle Scholar
  4. Cohen J (1988) Statistical power analysis for the behavioural sciences, 2nd edn. Lawrence Erlbaum Associates, HillsdaleGoogle Scholar
  5. Embretson SE, Reise SP (2000) Item response theory for psychologists. Lawrence Erlbaum, MahwahGoogle Scholar
  6. Emons WHM (1998) Nonequivalent groups IRT observed-score equating: its applicability and appropriateness for the Swedish Scholastic Aptitude Test. Twente University (unpublished report)Google Scholar
  7. Holland PW, Rubin DR (eds) (1982) Test equating. Academic, New YorkGoogle Scholar
  8. Holland PW, Wightman LE (1982) Section pre-equating: a preliminary investigation. In: Holland PW, Rubin DR (eds) Test equating. Academic, New York, pp 271–297Google Scholar
  9. Kiplinger VL, Linn RL (1996) Raising the stakes of test administration: the impact on student performance on the National Assessment of Educational Progress. Educ Assess 3:111–133CrossRefGoogle Scholar
  10. Kolen MJ, Brennan RL (2004) Test equating, scaling, and linking, 2nd edn. Springer Verlag, New YorkCrossRefzbMATHGoogle Scholar
  11. Linacre JM (2002) What do infit and outfit, mean-square and standardized mean? Rasch Meas 16:878Google Scholar
  12. Maier MH (1993) Military aptitude testing: the past fifty years (DMCM Technical Report 93-700). Defence Manpower Data Center, Montery, CAGoogle Scholar
  13. Mair P, Hatzinger R, Maier M (2010) eRm: Extended Rasch Modeling. Retrieved from http: //CRAN.R-project.org/package=eRmGoogle Scholar
  14. Meijer RR, Sijtsma K (2001) Methodology review: evaluating person fit. Appl Psychol Meas 25:107–135CrossRefMathSciNetGoogle Scholar
  15. Mittelhaëuser M, Béguin AA, Sijtsma K (2011) Comparing the effectiveness of different linking designs: the internal anchor versus the external anchor and pre-test data (Report No. 11-01). Retrieved from Psychometric Research Centre Web site: http://www.cito.nl/~/media/cito_nl/Files/Onderzoek%20en%20wetenschap/cito_mrd_report_2011_01.ashx
  16. Mittelhaëuser M, Béguin AA, Sijtsma K (2013) Modeling differences in test-taking motivation: exploring the usefulness of the mixture Rasch model and person-fit statistics. In: Millsap RE, van der Ark LA, Bolt DM, Woods CM (eds) New developments in quantitative psychology. Springer, New York, pp 357–370CrossRefGoogle Scholar
  17. O’Neill HF, Sugrue B, Baker EL (1996) Effects of motivational interventions on the National Assessment of Educational Progress mathematics performance. Educ Assess 3:135–157CrossRefGoogle Scholar
  18. Rasch G (1960) Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research, CopenhagenGoogle Scholar
  19. Reckase MD (2009) Multidimensional item response theory models. Springer Verlag, New YorkCrossRefGoogle Scholar
  20. Reise SP, Flannery WP (1996) Assessing person-fit on measures of typical performance. Appl Meas Educ 9:9–26CrossRefGoogle Scholar
  21. Scheerens J, Glas C, Thomas SM (2003) Educational evaluation, assessment and monitoring: a systematic approach. Swets & Zeitlinger, LisseGoogle Scholar
  22. Verhelst ND, Glas CAW, Verstralen HHFM (1995) One-parameter logistic model (OPLM). Cito, National Institute for Educational Measurement, ArnhemGoogle Scholar
  23. von Davier AA (2013) Observed-score equating: an overview. Psychometrika 78:605–623CrossRefzbMATHMathSciNetGoogle Scholar
  24. von Davier AA, Holland PW, Thayer DT (2004) The kernel method of test equating. Springer, New YorkzbMATHGoogle Scholar
  25. Wise SL, DeMars CE (2005) Low examinee effort in low-stakes assessment: problems and potential solutions. Educ Assess 10:1–17CrossRefGoogle Scholar
  26. Wise SL, Kong X (2005) Response time effort: a new measure of examinee motivation in computer-based tests. Appl Meas Educ 18:163–183CrossRefGoogle Scholar
  27. Wolf LF, Smith JK (1995) The consequence of consequence: motivation, anxiety and test performance. Appl Meas Educ 8:227–242CrossRefGoogle Scholar
  28. Wolf LF, Smith JK, Birnbaum ME (1995) The consequence of performance, test, motivation, and mentally taxing. Appl Meas Educ 8:341–351CrossRefGoogle Scholar
  29. Wright BD, Masters GN (1982) Rating scale analysis. Mesa Press, ChicagoGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Marie-Anne Mittelhaëuser
    • 1
    Email author
  • Anton A. Béguin
    • 1
  • Klaas Sijtsma
    • 2
  1. 1.CitoArnhemThe Netherlands
  2. 2.Tilburg UniversityTilburgThe Netherlands

Personalised recommendations