Advertisement

A Cost–Benefit Analysis for Developing Item Banks in Higher Education

  • Silvester DraaijerEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1014)

Abstract

Item banks in higher education can be regarded as important assets to increasing the quality of education and assessment. An item bank allows for the flexible administration of computer-based achievement tests for summative purposes, as well as quizzes for formative purposes. Developing item banks, however, can require quite an investment. A well-worked-out business case can help with convincing stakeholders to start an item bank development project. An important part of such a business case should be the increase in item quality and the estimated reduction in costs, particularly for the collaborative development of an item bank. However, a theoretical underpinning of a business case, incorporating considerations based on classical test theory is lacking in the literature. Therefore, a model is described to make estimations of reductions in misclassifications and per-unit costs. Examples are presented of the likelihood of reducing misclassifications and cost per unit based on findings in the literature. Implications for research and practice are discussed.

Keywords

Item banking Question development Test development Educational measurement Economics Multiple-choice questions MCQs Higher education 

References

  1. 1.
    Anderson, S.B.: The role of the teacher-made test in higher education. New Dir. Community Coll. 1987, 39–44 (1987).  https://doi.org/10.1002/cc.36819875907CrossRefGoogle Scholar
  2. 2.
    Jozefowicz, R.F., Koeppen, B.M., Case, S.M., Galbraith, R., Swanson, D., Glew, R.H.: The quality of in-house medical school examinations. Acad. Med. 77, 156–161 (2002)CrossRefGoogle Scholar
  3. 3.
    Jugar, R.R.: An inquiry on the roles of personal test item banking (PTIB) and table of specifications (TOS) in the construction and utilization of classroom tests. Int. J. Educ. Res. 1, 1–8 (2013)Google Scholar
  4. 4.
    Vale, C.D.: Computerized item banking. In: Downing, S.M., Haladyna, T.M. (eds.) Handbook of Test Development. Lawrence Earlbaum Associates, Mahwah (2006)Google Scholar
  5. 5.
    Lane, S., Raymond, M.R., Haladyna, T.M.: Handbook of Test Development. Routledge, New York (2015)CrossRefGoogle Scholar
  6. 6.
    Draaijer, S., De Werk, J.: Handboek In 5 stappen naar een itembank [Handbook In 5 steps to an item bank]. SURF (2018)Google Scholar
  7. 7.
    Downing, S.M., Haladyna, T.M.: Test item development: validity evidence from quality assurance procedures. Appl. Meas. Educ. 10, 61–82 (1997).  https://doi.org/10.1207/s15324818ame1001_4CrossRefGoogle Scholar
  8. 8.
    Davey, T.: Practical Considerations in Computer-Based Testing. ETS Research and Development Division (2011)Google Scholar
  9. 9.
    Van der Linden, W.J., Glas, C.A.W.: Computerized Adaptive Testing: Theory and Practice. Springer, Dordrecht (2000).  https://doi.org/10.1007/0-306-47531-6CrossRefGoogle Scholar
  10. 10.
    Rudner, L.M., Guo, F.: Computer adaptive testing for small scale programs and instructional systems. J. Appl. Test. Technol. 12, 1–12 (2011)Google Scholar
  11. 11.
    Baumeister, H., Abberger, B., Haschke, A., Boecker, M., Bengel, J., Wirtz, M.: Development and calibration of an item bank for the assessment of activities of daily living in cardiovascular patients using Rasch analysis. Health Qual. Life Outcomes 11, 133 (2013).  https://doi.org/10.1186/1477-7525-11-133CrossRefGoogle Scholar
  12. 12.
    Attali, Y.: Automatic item generation unleashed: an evaluation of a large-scale deployment of item models. In: Penstein Rosé, C., et al. (eds.) AIED 2018. LNCS (LNAI), vol. 10947, pp. 17–29. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-93843-1_2CrossRefGoogle Scholar
  13. 13.
    Gierl, M.J., Haladyna, T.M.: Automatic Item Generation: Theory and Practice. Routledge, New York (2012)CrossRefGoogle Scholar
  14. 14.
    Gierl, M.J., Lai, H.: Instructional topics in educational measurement (ITEMS) module: using automated processes to generate test items. Educ. Meas. Issues Pract. 32, 36–50 (2013)CrossRefGoogle Scholar
  15. 15.
    Glas, C.A.W., Van der Linden, W.J.: Computerized adaptive testing with item cloning. Appl. Psychol. Meas. 27, 247–261 (2003).  https://doi.org/10.1177/0146621603027004001MathSciNetCrossRefGoogle Scholar
  16. 16.
    Gierl, M.J., Lai, H.: Evaluating the quality of medical multiple-choice items created with automated processes. Med. Educ. 47, 726–733 (2013).  https://doi.org/10.1111/medu.12202CrossRefGoogle Scholar
  17. 17.
    Draaijer, S.: Supporting teachers in higher education in designing test items (2016). http://dare.ubvu.vu.nl/handle/1871/54397
  18. 18.
    Hartog, R., Draaijer, S., Rietveld, L.C.: Practical aspects of task allocation in design and development of digital closed questions in higher education. Pract. Assess. Res. Eval. 13, 2–15 (2008)Google Scholar
  19. 19.
  20. 20.
    Osterlind, S.J.: Constructing Test Items: Multiple-Choice, Constructed-Response, Performance, and Other Formats. Kluwer Academic Publisher, Norwell (1998)Google Scholar
  21. 21.
    Cizek, G.J.: More unintended consequences of high-stakes testing. Educ. Meas. Issues Pract. 20, 19–27 (2001).  https://doi.org/10.1111/j.1745-3992.2001.tb00072.xCrossRefGoogle Scholar
  22. 22.
    Haladyna, T.M., Downing, S.M., Rodriguez, M.C.: A review of multiple-choice item-writing guidelines for classroom assessment. Appl. Meas. Educ. 15, 309–333 (2002).  https://doi.org/10.1207/S15324818AME1503_5CrossRefGoogle Scholar
  23. 23.
    Gerritsen-van Leeuwenkamp, K.: Het relatieve belang van vijftig kwaliteitskenmerken van toetsing voor studententevredenheid in het hoger beroepsonderwijs [The relative importance of fifty quality indicators for measurement of student satisfaction in higher education] (2012). http://hdl.handle.net/1820/4295
  24. 24.
    Kano, N., Seraku, N., Takahashi, F., Tsuji, S.: Attractive quality and must-be quality. J. Jpn. Soc. Qual. Control 14, 39–48 (1984)Google Scholar
  25. 25.
    Bloom, B.S.: Taxonomy of Educational Objectives, the Classification of Educational Goals – Handbook I: Cognitive Domain. McKay, New York (1956)Google Scholar
  26. 26.
    Haladyna, T.M.: Writing Test Items to Evaluate Higher Order Thinking. Allyn & Bacon, Needham Heights (1997)Google Scholar
  27. 27.
    Haladyna, T.M.: Developing and Validating Multiple-Choice Test Items. Lawrence Erlbaum Associates, London (2004)Google Scholar
  28. 28.
    Ebel, R.L.: Essentials of Educational Measurement. Prentice-Hall, Englewood Cliffs (1979)Google Scholar
  29. 29.
    De Gruijter, D.N.M.: Toetsing en toetsanalyse [Testing and test analysis]. ICLON, Sectie Onderwijsontwikkeling Universiteit Leiden, Leiden (2008)Google Scholar
  30. 30.
    Olsen, J.B., Bunderson, B.: How to write good test questions [powerpoint presentation] (2004)Google Scholar
  31. 31.
    Spearman, C.: Correlation calculated from faulty data. Br. J. Psychol. 3(271–295), 1904–1920 (1910).  https://doi.org/10.1111/j.2044-8295.1910.tb00206.xCrossRefGoogle Scholar
  32. 32.
    Brown, W.: Some experimental results in the correlation of mental abilities. Br. J. Psychol. 1904–1920(3), 296–322 (1910)Google Scholar
  33. 33.
    Draaijer, S.: Rule of thumb: 40 questions in a 4-choice multiple-choice test. Why? – Draaijer on Assessment and Testing. https://draaijeronassessmentandtesting.wordpress.com/2014/10/23/rule-of-thumb-40-questions-in-a-4-choice-multiple-choice-test-why/
  34. 34.
    Gibson, W.M., Weiner, J.A.: Generating random parallel test forms using CTT in a computer-based environment. J. Educ. Meas. 35, 297–310 (1998).  https://doi.org/10.1111/j.1745-3984.1998.tb00540.xCrossRefGoogle Scholar
  35. 35.
    Douglas, K.M.: A general method for estimating the classification reliability of complex decisions based on configural combinations of multiple assessment scores (2007)Google Scholar
  36. 36.
    Eggen, T., Sanders, P.: Psychometrie in de praktijk [Psychometrics in Practice]. Cito Instituut voor Toetsontwikkeling, Arnhem (1993)Google Scholar
  37. 37.
    Rush, B.R., Rankin, D.C., White, B.J.: The impact of item-writing flaws and item complexity on examination item difficulty and discrimination value. BMC Med. Educ. 16, 250 (2016)CrossRefGoogle Scholar
  38. 38.
    Fitzgerald, C.: Risk management: calculating the bottom line of developing a certification or licensure exam (2005). https://www2.caveon.com/2005/02/08/risk-management-calculating-the-bottom-line-of-developing-a-certification-or-licensure-exam/
  39. 39.
    Parshall, C.G., Spray, J.A., Kalohn, J.C., Davey, T.: Practical Considerations in Computer-Based Testing. Springer, New York (2002).  https://doi.org/10.1007/978-1-4613-0083-0CrossRefzbMATHGoogle Scholar
  40. 40.
    Downing, S.M.: Construct-irrelevant variance and flawed test questions: do multiple-choice item-writing principles make any difference? Acad. Med. 77, S103–S104 (2002)CrossRefGoogle Scholar
  41. 41.
    Mayenga, C.: Mapping item writing tasks on the item writing ability scale. In: XXXVIIth Annual Conference on Canadian Society of Safety Engineering, Carleton University, Ottawa, Canada (2009)Google Scholar
  42. 42.
    Rodriguez, M.C.: Three options are optimal for multiple-choice items: a meta-analysis of 80 years of research. Educ. Meas. Issues Pract. 24, 3–13 (2005).  https://doi.org/10.1111/j.1745-3992.2005.00006.xCrossRefGoogle Scholar
  43. 43.
    Case, S.M., Holtzman, K., Ripkey, D.R.: Developing an item pool for CBT: a practical comparison of three models of item writing. Acad. Med. 76, S111–S113 (2001)CrossRefGoogle Scholar
  44. 44.
    Draaijer, S., Van Gastel, L., Peeters, V., Frinking, P., Reumer, C.: Flexibilisering van Toetsing. [Flexibility in Testing and Assessment]. Digitale Universiteit, Utrecht (2004)Google Scholar
  45. 45.
    Downing, S.M.: Threats to the validity of locally developed multiple-choice tests in medical education: construct-irrelevant variance and construct underrepresentation. Adv. Health Sci. Educ. Theory Pract. 7, 235–241 (2002).  https://doi.org/10.1023/A:1021112514626CrossRefGoogle Scholar
  46. 46.
    Tarrant, M., Knierim, A., Hayes, S.K., Ware, J.: The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments. Nurse Educ. Pract. 6, 354–363 (2006).  https://doi.org/10.1016/j.nepr.2006.07.002CrossRefGoogle Scholar
  47. 47.
    Tarrant, M., Ware, J.: Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursing assessments. Med. Educ. 42, 198–206 (2008)CrossRefGoogle Scholar
  48. 48.
    Downing, S.M.: The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Adv. Health Sci. Educ. 10, 133–143 (2005).  https://doi.org/10.1007/s10459-004-4019-5CrossRefGoogle Scholar
  49. 49.
    Wadi, M.M., Abdul Rahim, A.F., Yusoff, M.S.B., Baharuddin, K.A.: The effect of MCQ vetting on students’ examination performance. Educ. Med. J. 6 (2014)  https://doi.org/10.5959/eimj.v6i2.216
  50. 50.
    Hassan, S., Simbak, N., Yussof, H.: Structured vetting procedure of examination questions in medical education in faculty of medicine at universiti sultan zainal abidin. Malays. J. Public Health Med. 16, 29–37 (2016)Google Scholar
  51. 51.
    Nabil Demaidi, M.: Why is the threshold of Point Biserial correlation (item discrimination) in item analysis 0.2? https://www.researchgate.net/post/Why_is_the_threshold_of_Point_biserial_correlation_item_discrimination_in_item_analysis_02
  52. 52.
    Crocker, L., Algina, J.: Introduction to Classical and Modern Test Theory. Holt, Rinehart and Winston, Orlando (1986)Google Scholar
  53. 53.
    Leahy, J.M., Smith, A.: Economics of item development: key cost factors impacting program profitability. Asia ATP (2014)Google Scholar
  54. 54.
    Wainer, H., Thissen, D.: Combining multiple-choice and constructed-response test scores: toward a marxist theory of test construction. Appl. Meas. Educ. 6, 103 (1993)CrossRefGoogle Scholar
  55. 55.
    Karpicke, J.D., Roediger, H.L.: The critical importance of retrieval for learning. Science 319, 966–968 (2008).  https://doi.org/10.1126/science.1152408CrossRefGoogle Scholar
  56. 56.
    Roediger, H.L.I., Agarwal, P.K., McDaniel, M.A., McDermott, K.B.: Test-enhanced learning in the classroom: long-term improvements from quizzing. J. Exp. Psychol. Appl. 17, 382–395 (2011).  https://doi.org/10.1037/a0026252CrossRefGoogle Scholar
  57. 57.
    Slusser, S.R., Erickson, R.J.: Group quizzes: an extension of the collaborative learning process. Teach. Sociol. 34, 249–262 (2006).  https://doi.org/10.1177/0092055X0603400304CrossRefGoogle Scholar
  58. 58.
    Davey, T., Nering, M.: Controlling item exposure and maintaining item security. In: Mills, C.N., Potenza, M.T., Fremer, J.J., Ward, W.C. (eds.) Computer-Based Testing, Building the Foundation for Future Assessments. Lawrence Erlbaum Associates, Mahwah (2002)Google Scholar
  59. 59.
    Hattie, J., Timperley, H.: The power of feedback. Rev. Educ. Res. 77, 81–112 (2007).  https://doi.org/10.3102/003465430298487CrossRefGoogle Scholar
  60. 60.
    Butler, M., Pyzdrowski, L., Goodykoontz, A., Walker, V.: The effects of feedback on online quizzes. Int. J. Technol. Math. Educ. 15, 131–136 (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Faculty of Behavioural and Movement Sciences, Department of Research and Theory in EducationVrije Universiteit AmsterdamAmsterdamThe Netherlands

Personalised recommendations