Item Response Models in Computerized Adaptive Testing: A Simulation Study

  • Maria Eugénia Ferrão
  • Paula Prata
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8581)


In the digital world, any conceptual assessment framework faces two main challenges: (a) the complexity of knowledge, capacities and skills to be assessed; (b) the increasing usability of web-based assessments, which requires innovative approaches to the development, delivery and scoring of tests. Statistical methods play a central role in such framework. Item response models have been the most common statistical methods used to address such kind of measurement challenges, and they have been used in computer-based adaptive tests, which allow the item selection adaptively, from an item pool, according to the person ability during test administration. The test is tailored to each student. In this paper we conduct a simulation study based on the minimum error-variance criterion method varying the item exposure rate (0.1, 0.3, 0.5) and the test maximum length (18, 27, 36). The comparison is done by examining the absolute bias, the root mean square-error, and the correlation. Hypotheses tests are applied to compare the true and estimated distributions. The results suggest the considerable reduction of bias as the number of item administered increases, the occurrence of ceiling effect in very small size tests, the full agreement between true and empirical distributions for computerized tests of length smaller than the paper-and-pencil tests.


Item response model computerized adaptive testing measurement error 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Association for Educational Assessment - Europe: European Framework of Standards for Educational Assessment 1.0 (2010)Google Scholar
  2. 2.
    Lord, F.: A Theory of Test Scores. Psychometric Monograph, vol. (7). Richmond, VA (1952)Google Scholar
  3. 3.
    Lord, F.M., Novick, M.R., Birnbaum, A.: Statistical Theories of Mental Test Scores. Addison-Wesley, Oxford (1968)Google Scholar
  4. 4.
    Lord, F.M.: Applications of Item Response Theory to Practical Testing Problems. Erlbaum, Hillsdale (1980)Google Scholar
  5. 5.
    Lin, C.: Comparisons between Classical Test Theory and Item Response Theory in Automated Assembly of Parallel Test Forms in Automated Assembly of Parallel Test Forms. J. Technol. Learn. Assess. 6, 1–43 (2008)Google Scholar
  6. 6.
    Eignor, D.R.: Future Challenges in Psychometrics: Linking Scores Across Computer and Paper-Based Modes of Test Administration. In: Handbook of Statistics (Psychometrics), pp. 1099–1102 (2007)Google Scholar
  7. 7.
    Yao, L.: Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores: Theory and Applications. Psychometrika 77, 495–523 (2012)CrossRefMathSciNetzbMATHGoogle Scholar
  8. 8.
    Van der Linden, W.J.: Multidimensional adaptive testing with a minimum error-variance criterion. J. Educ. Behav. Stat. 24, 398–412 (1999)CrossRefGoogle Scholar
  9. 9.
    Yao, L.: simuMCAT: Simulation of multidimensional computer adaptive testing. Monterey (2011)Google Scholar
  10. 10.
    Ferrão, M.E., Costa, P., Navio, V.M., Dias, V.M.: Medição da competência dos alunos do ensino básico em Matemática, 3EMat: uma proposta. In: Machado, C., Almeida, L., Guisande, M.A., Gonçalves, M., Ramalho, V. (eds.) XI Conferência Internacional Avaliação Psicológica: Formas e Contextos, pp. 905–915. Psiquilíbrios, Braga (2006)Google Scholar
  11. 11.
    Costa, P., Ferrão, M.E., Fernandes, N., Soares, T.: Uma aplicação da análise factorial na detecção das dimensões cognitivas em testes de avaliação em larga escala em Portugal (An application of Factorial Analysis for the Detection of Cognitive Dimensions in Large Scale Assessment in Portugal). In: Anais do XLI Simpósio Brasileiro de Pesquisa Operacional – Pesquisa Operacional na Gestão do Conhecimento, pp. 391–402. SOBRAPO, Rio de Janeiro (2009)Google Scholar
  12. 12.
    Ferrão, M.E., Costa, P.: Melhoria da Qualidade dos Instrumentos e Escalas de Aferição dos Resultados Escolares: Ligação entre Escalas das Provas de Aferição de Matemática e 3EMat. Covilhã (2009)Google Scholar
  13. 13.
    Ferrão, M.E., Loureiro, M.J., Navio, V.M., Coelho, I.: Aferição das Aprendizagens em Matemática no Ensino Básico: A proposta 3EMat. Universidade da Beira Interior, Covilhã (2009)Google Scholar
  14. 14.
    Ministério da Educação: Currículo Nacional do Ensino Básico: Competências Essenciais (National Curriculum in Primary, Elementary and Lower Secondary Education: essential competencies). Ministério da Educação, Lisboa (2001)Google Scholar
  15. 15.
    Ministério da Educação: Currículo Nacional do Ensino Básico: Organização Curricular e Programas do Ensino Básico – 1° ciclo (National Curriculum in Primary, Elementary and Lower Secondary Education: curricular organization). Ministério da Educação, Lisboa (2001)Google Scholar
  16. 16.
    Yao, L.: The BMIRT Toolkit. Monterey (2013)Google Scholar
  17. 17.
    Yao, L.: Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Appl. Psychol. Meas. 37, 3–23 (2012)CrossRefGoogle Scholar
  18. 18.
    AERA, APA, NCME: The Standards for Educational and Psychological Testing. Washington DC (1999)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Maria Eugénia Ferrão
    • 1
    • 2
  • Paula Prata
    • 1
    • 3
  1. 1.University of Beira InteriorPortugal
  2. 2.Centre for Applied Mathematics and Economics (CEMAPRE)Portugal
  3. 3.Instituto de Telecomunicações (IT)Portugal

Personalised recommendations