Skip to main content

A Cost–Benefit Analysis for Developing Item Banks in Higher Education

  • Conference paper
  • First Online:
Book cover Technology Enhanced Assessment (TEA 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1014))

Included in the following conference series:

  • 644 Accesses

Abstract

Item banks in higher education can be regarded as important assets to increasing the quality of education and assessment. An item bank allows for the flexible administration of computer-based achievement tests for summative purposes, as well as quizzes for formative purposes. Developing item banks, however, can require quite an investment. A well-worked-out business case can help with convincing stakeholders to start an item bank development project. An important part of such a business case should be the increase in item quality and the estimated reduction in costs, particularly for the collaborative development of an item bank. However, a theoretical underpinning of a business case, incorporating considerations based on classical test theory is lacking in the literature. Therefore, a model is described to make estimations of reductions in misclassifications and per-unit costs. Examples are presented of the likelihood of reducing misclassifications and cost per unit based on findings in the literature. Implications for research and practice are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This is unsurprising because in the case of adaptive testing programmes using item response theory, the main argument for using this complex assessment technique relates largely to being able to develop up to 25% shorter tests for test takers and better controllable reliability. It also relates to preventing item exposure [8]. However, it does not aim to calculate the percentage of misclassifications [5, 8, 9]. It also must be noted that developing multi-stage tests (MSTs) and computer-adaptive tests using IRT methods should be regarded as only feasible for the professional testing industry [10] or in some selected cases for the domain of medicine [11].

References

  1. Anderson, S.B.: The role of the teacher-made test in higher education. New Dir. Community Coll. 1987, 39–44 (1987). https://doi.org/10.1002/cc.36819875907

    Article  Google Scholar 

  2. Jozefowicz, R.F., Koeppen, B.M., Case, S.M., Galbraith, R., Swanson, D., Glew, R.H.: The quality of in-house medical school examinations. Acad. Med. 77, 156–161 (2002)

    Article  Google Scholar 

  3. Jugar, R.R.: An inquiry on the roles of personal test item banking (PTIB) and table of specifications (TOS) in the construction and utilization of classroom tests. Int. J. Educ. Res. 1, 1–8 (2013)

    Google Scholar 

  4. Vale, C.D.: Computerized item banking. In: Downing, S.M., Haladyna, T.M. (eds.) Handbook of Test Development. Lawrence Earlbaum Associates, Mahwah (2006)

    Google Scholar 

  5. Lane, S., Raymond, M.R., Haladyna, T.M.: Handbook of Test Development. Routledge, New York (2015)

    Book  Google Scholar 

  6. Draaijer, S., De Werk, J.: Handboek In 5 stappen naar een itembank [Handbook In 5 steps to an item bank]. SURF (2018)

    Google Scholar 

  7. Downing, S.M., Haladyna, T.M.: Test item development: validity evidence from quality assurance procedures. Appl. Meas. Educ. 10, 61–82 (1997). https://doi.org/10.1207/s15324818ame1001_4

    Article  Google Scholar 

  8. Davey, T.: Practical Considerations in Computer-Based Testing. ETS Research and Development Division (2011)

    Google Scholar 

  9. Van der Linden, W.J., Glas, C.A.W.: Computerized Adaptive Testing: Theory and Practice. Springer, Dordrecht (2000). https://doi.org/10.1007/0-306-47531-6

    Book  Google Scholar 

  10. Rudner, L.M., Guo, F.: Computer adaptive testing for small scale programs and instructional systems. J. Appl. Test. Technol. 12, 1–12 (2011)

    Google Scholar 

  11. Baumeister, H., Abberger, B., Haschke, A., Boecker, M., Bengel, J., Wirtz, M.: Development and calibration of an item bank for the assessment of activities of daily living in cardiovascular patients using Rasch analysis. Health Qual. Life Outcomes 11, 133 (2013). https://doi.org/10.1186/1477-7525-11-133

    Article  Google Scholar 

  12. Attali, Y.: Automatic item generation unleashed: an evaluation of a large-scale deployment of item models. In: Penstein Rosé, C., et al. (eds.) AIED 2018. LNCS (LNAI), vol. 10947, pp. 17–29. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93843-1_2

    Chapter  Google Scholar 

  13. Gierl, M.J., Haladyna, T.M.: Automatic Item Generation: Theory and Practice. Routledge, New York (2012)

    Book  Google Scholar 

  14. Gierl, M.J., Lai, H.: Instructional topics in educational measurement (ITEMS) module: using automated processes to generate test items. Educ. Meas. Issues Pract. 32, 36–50 (2013)

    Article  Google Scholar 

  15. Glas, C.A.W., Van der Linden, W.J.: Computerized adaptive testing with item cloning. Appl. Psychol. Meas. 27, 247–261 (2003). https://doi.org/10.1177/0146621603027004001

    Article  MathSciNet  Google Scholar 

  16. Gierl, M.J., Lai, H.: Evaluating the quality of medical multiple-choice items created with automated processes. Med. Educ. 47, 726–733 (2013). https://doi.org/10.1111/medu.12202

    Article  Google Scholar 

  17. Draaijer, S.: Supporting teachers in higher education in designing test items (2016). http://dare.ubvu.vu.nl/handle/1871/54397

  18. Hartog, R., Draaijer, S., Rietveld, L.C.: Practical aspects of task allocation in design and development of digital closed questions in higher education. Pract. Assess. Res. Eval. 13, 2–15 (2008)

    Google Scholar 

  19. ETS: How ETS creates test questions. http://www.ets.org/s/understanding_testing/flash/how_ets_creates_test_questions.html

  20. Osterlind, S.J.: Constructing Test Items: Multiple-Choice, Constructed-Response, Performance, and Other Formats. Kluwer Academic Publisher, Norwell (1998)

    Google Scholar 

  21. Cizek, G.J.: More unintended consequences of high-stakes testing. Educ. Meas. Issues Pract. 20, 19–27 (2001). https://doi.org/10.1111/j.1745-3992.2001.tb00072.x

    Article  Google Scholar 

  22. Haladyna, T.M., Downing, S.M., Rodriguez, M.C.: A review of multiple-choice item-writing guidelines for classroom assessment. Appl. Meas. Educ. 15, 309–333 (2002). https://doi.org/10.1207/S15324818AME1503_5

    Article  Google Scholar 

  23. Gerritsen-van Leeuwenkamp, K.: Het relatieve belang van vijftig kwaliteitskenmerken van toetsing voor studententevredenheid in het hoger beroepsonderwijs [The relative importance of fifty quality indicators for measurement of student satisfaction in higher education] (2012). http://hdl.handle.net/1820/4295

  24. Kano, N., Seraku, N., Takahashi, F., Tsuji, S.: Attractive quality and must-be quality. J. Jpn. Soc. Qual. Control 14, 39–48 (1984)

    Google Scholar 

  25. Bloom, B.S.: Taxonomy of Educational Objectives, the Classification of Educational Goals – Handbook I: Cognitive Domain. McKay, New York (1956)

    Google Scholar 

  26. Haladyna, T.M.: Writing Test Items to Evaluate Higher Order Thinking. Allyn & Bacon, Needham Heights (1997)

    Google Scholar 

  27. Haladyna, T.M.: Developing and Validating Multiple-Choice Test Items. Lawrence Erlbaum Associates, London (2004)

    Google Scholar 

  28. Ebel, R.L.: Essentials of Educational Measurement. Prentice-Hall, Englewood Cliffs (1979)

    Google Scholar 

  29. De Gruijter, D.N.M.: Toetsing en toetsanalyse [Testing and test analysis]. ICLON, Sectie Onderwijsontwikkeling Universiteit Leiden, Leiden (2008)

    Google Scholar 

  30. Olsen, J.B., Bunderson, B.: How to write good test questions [powerpoint presentation] (2004)

    Google Scholar 

  31. Spearman, C.: Correlation calculated from faulty data. Br. J. Psychol. 3(271–295), 1904–1920 (1910). https://doi.org/10.1111/j.2044-8295.1910.tb00206.x

    Article  Google Scholar 

  32. Brown, W.: Some experimental results in the correlation of mental abilities. Br. J. Psychol. 1904–1920(3), 296–322 (1910)

    Google Scholar 

  33. Draaijer, S.: Rule of thumb: 40 questions in a 4-choice multiple-choice test. Why? – Draaijer on Assessment and Testing. https://draaijeronassessmentandtesting.wordpress.com/2014/10/23/rule-of-thumb-40-questions-in-a-4-choice-multiple-choice-test-why/

  34. Gibson, W.M., Weiner, J.A.: Generating random parallel test forms using CTT in a computer-based environment. J. Educ. Meas. 35, 297–310 (1998). https://doi.org/10.1111/j.1745-3984.1998.tb00540.x

    Article  Google Scholar 

  35. Douglas, K.M.: A general method for estimating the classification reliability of complex decisions based on configural combinations of multiple assessment scores (2007)

    Google Scholar 

  36. Eggen, T., Sanders, P.: Psychometrie in de praktijk [Psychometrics in Practice]. Cito Instituut voor Toetsontwikkeling, Arnhem (1993)

    Google Scholar 

  37. Rush, B.R., Rankin, D.C., White, B.J.: The impact of item-writing flaws and item complexity on examination item difficulty and discrimination value. BMC Med. Educ. 16, 250 (2016)

    Article  Google Scholar 

  38. Fitzgerald, C.: Risk management: calculating the bottom line of developing a certification or licensure exam (2005). https://www2.caveon.com/2005/02/08/risk-management-calculating-the-bottom-line-of-developing-a-certification-or-licensure-exam/

  39. Parshall, C.G., Spray, J.A., Kalohn, J.C., Davey, T.: Practical Considerations in Computer-Based Testing. Springer, New York (2002). https://doi.org/10.1007/978-1-4613-0083-0

    Book  MATH  Google Scholar 

  40. Downing, S.M.: Construct-irrelevant variance and flawed test questions: do multiple-choice item-writing principles make any difference? Acad. Med. 77, S103–S104 (2002)

    Article  Google Scholar 

  41. Mayenga, C.: Mapping item writing tasks on the item writing ability scale. In: XXXVIIth Annual Conference on Canadian Society of Safety Engineering, Carleton University, Ottawa, Canada (2009)

    Google Scholar 

  42. Rodriguez, M.C.: Three options are optimal for multiple-choice items: a meta-analysis of 80 years of research. Educ. Meas. Issues Pract. 24, 3–13 (2005). https://doi.org/10.1111/j.1745-3992.2005.00006.x

    Article  Google Scholar 

  43. Case, S.M., Holtzman, K., Ripkey, D.R.: Developing an item pool for CBT: a practical comparison of three models of item writing. Acad. Med. 76, S111–S113 (2001)

    Article  Google Scholar 

  44. Draaijer, S., Van Gastel, L., Peeters, V., Frinking, P., Reumer, C.: Flexibilisering van Toetsing. [Flexibility in Testing and Assessment]. Digitale Universiteit, Utrecht (2004)

    Google Scholar 

  45. Downing, S.M.: Threats to the validity of locally developed multiple-choice tests in medical education: construct-irrelevant variance and construct underrepresentation. Adv. Health Sci. Educ. Theory Pract. 7, 235–241 (2002). https://doi.org/10.1023/A:1021112514626

    Article  Google Scholar 

  46. Tarrant, M., Knierim, A., Hayes, S.K., Ware, J.: The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments. Nurse Educ. Pract. 6, 354–363 (2006). https://doi.org/10.1016/j.nepr.2006.07.002

    Article  Google Scholar 

  47. Tarrant, M., Ware, J.: Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursing assessments. Med. Educ. 42, 198–206 (2008)

    Article  Google Scholar 

  48. Downing, S.M.: The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Adv. Health Sci. Educ. 10, 133–143 (2005). https://doi.org/10.1007/s10459-004-4019-5

    Article  Google Scholar 

  49. Wadi, M.M., Abdul Rahim, A.F., Yusoff, M.S.B., Baharuddin, K.A.: The effect of MCQ vetting on students’ examination performance. Educ. Med. J. 6 (2014) https://doi.org/10.5959/eimj.v6i2.216

  50. Hassan, S., Simbak, N., Yussof, H.: Structured vetting procedure of examination questions in medical education in faculty of medicine at universiti sultan zainal abidin. Malays. J. Public Health Med. 16, 29–37 (2016)

    Google Scholar 

  51. Nabil Demaidi, M.: Why is the threshold of Point Biserial correlation (item discrimination) in item analysis 0.2? https://www.researchgate.net/post/Why_is_the_threshold_of_Point_biserial_correlation_item_discrimination_in_item_analysis_02

  52. Crocker, L., Algina, J.: Introduction to Classical and Modern Test Theory. Holt, Rinehart and Winston, Orlando (1986)

    Google Scholar 

  53. Leahy, J.M., Smith, A.: Economics of item development: key cost factors impacting program profitability. Asia ATP (2014)

    Google Scholar 

  54. Wainer, H., Thissen, D.: Combining multiple-choice and constructed-response test scores: toward a marxist theory of test construction. Appl. Meas. Educ. 6, 103 (1993)

    Article  Google Scholar 

  55. Karpicke, J.D., Roediger, H.L.: The critical importance of retrieval for learning. Science 319, 966–968 (2008). https://doi.org/10.1126/science.1152408

    Article  Google Scholar 

  56. Roediger, H.L.I., Agarwal, P.K., McDaniel, M.A., McDermott, K.B.: Test-enhanced learning in the classroom: long-term improvements from quizzing. J. Exp. Psychol. Appl. 17, 382–395 (2011). https://doi.org/10.1037/a0026252

    Article  Google Scholar 

  57. Slusser, S.R., Erickson, R.J.: Group quizzes: an extension of the collaborative learning process. Teach. Sociol. 34, 249–262 (2006). https://doi.org/10.1177/0092055X0603400304

    Article  Google Scholar 

  58. Davey, T., Nering, M.: Controlling item exposure and maintaining item security. In: Mills, C.N., Potenza, M.T., Fremer, J.J., Ward, W.C. (eds.) Computer-Based Testing, Building the Foundation for Future Assessments. Lawrence Erlbaum Associates, Mahwah (2002)

    Google Scholar 

  59. Hattie, J., Timperley, H.: The power of feedback. Rev. Educ. Res. 77, 81–112 (2007). https://doi.org/10.3102/003465430298487

    Article  Google Scholar 

  60. Butler, M., Pyzdrowski, L., Goodykoontz, A., Walker, V.: The effects of feedback on online quizzes. Int. J. Technol. Math. Educ. 15, 131–136 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Silvester Draaijer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Draaijer, S. (2019). A Cost–Benefit Analysis for Developing Item Banks in Higher Education. In: Draaijer, S., Joosten-ten Brinke, D., Ras, E. (eds) Technology Enhanced Assessment. TEA 2018. Communications in Computer and Information Science, vol 1014. Springer, Cham. https://doi.org/10.1007/978-3-030-25264-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-25264-9_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-25263-2

  • Online ISBN: 978-3-030-25264-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics