Abstract
Item banks in higher education can be regarded as important assets to increasing the quality of education and assessment. An item bank allows for the flexible administration of computer-based achievement tests for summative purposes, as well as quizzes for formative purposes. Developing item banks, however, can require quite an investment. A well-worked-out business case can help with convincing stakeholders to start an item bank development project. An important part of such a business case should be the increase in item quality and the estimated reduction in costs, particularly for the collaborative development of an item bank. However, a theoretical underpinning of a business case, incorporating considerations based on classical test theory is lacking in the literature. Therefore, a model is described to make estimations of reductions in misclassifications and per-unit costs. Examples are presented of the likelihood of reducing misclassifications and cost per unit based on findings in the literature. Implications for research and practice are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This is unsurprising because in the case of adaptive testing programmes using item response theory, the main argument for using this complex assessment technique relates largely to being able to develop up to 25% shorter tests for test takers and better controllable reliability. It also relates to preventing item exposure [8]. However, it does not aim to calculate the percentage of misclassifications [5, 8, 9]. It also must be noted that developing multi-stage tests (MSTs) and computer-adaptive tests using IRT methods should be regarded as only feasible for the professional testing industry [10] or in some selected cases for the domain of medicine [11].
References
Anderson, S.B.: The role of the teacher-made test in higher education. New Dir. Community Coll. 1987, 39–44 (1987). https://doi.org/10.1002/cc.36819875907
Jozefowicz, R.F., Koeppen, B.M., Case, S.M., Galbraith, R., Swanson, D., Glew, R.H.: The quality of in-house medical school examinations. Acad. Med. 77, 156–161 (2002)
Jugar, R.R.: An inquiry on the roles of personal test item banking (PTIB) and table of specifications (TOS) in the construction and utilization of classroom tests. Int. J. Educ. Res. 1, 1–8 (2013)
Vale, C.D.: Computerized item banking. In: Downing, S.M., Haladyna, T.M. (eds.) Handbook of Test Development. Lawrence Earlbaum Associates, Mahwah (2006)
Lane, S., Raymond, M.R., Haladyna, T.M.: Handbook of Test Development. Routledge, New York (2015)
Draaijer, S., De Werk, J.: Handboek In 5 stappen naar een itembank [Handbook In 5 steps to an item bank]. SURF (2018)
Downing, S.M., Haladyna, T.M.: Test item development: validity evidence from quality assurance procedures. Appl. Meas. Educ. 10, 61–82 (1997). https://doi.org/10.1207/s15324818ame1001_4
Davey, T.: Practical Considerations in Computer-Based Testing. ETS Research and Development Division (2011)
Van der Linden, W.J., Glas, C.A.W.: Computerized Adaptive Testing: Theory and Practice. Springer, Dordrecht (2000). https://doi.org/10.1007/0-306-47531-6
Rudner, L.M., Guo, F.: Computer adaptive testing for small scale programs and instructional systems. J. Appl. Test. Technol. 12, 1–12 (2011)
Baumeister, H., Abberger, B., Haschke, A., Boecker, M., Bengel, J., Wirtz, M.: Development and calibration of an item bank for the assessment of activities of daily living in cardiovascular patients using Rasch analysis. Health Qual. Life Outcomes 11, 133 (2013). https://doi.org/10.1186/1477-7525-11-133
Attali, Y.: Automatic item generation unleashed: an evaluation of a large-scale deployment of item models. In: Penstein Rosé, C., et al. (eds.) AIED 2018. LNCS (LNAI), vol. 10947, pp. 17–29. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93843-1_2
Gierl, M.J., Haladyna, T.M.: Automatic Item Generation: Theory and Practice. Routledge, New York (2012)
Gierl, M.J., Lai, H.: Instructional topics in educational measurement (ITEMS) module: using automated processes to generate test items. Educ. Meas. Issues Pract. 32, 36–50 (2013)
Glas, C.A.W., Van der Linden, W.J.: Computerized adaptive testing with item cloning. Appl. Psychol. Meas. 27, 247–261 (2003). https://doi.org/10.1177/0146621603027004001
Gierl, M.J., Lai, H.: Evaluating the quality of medical multiple-choice items created with automated processes. Med. Educ. 47, 726–733 (2013). https://doi.org/10.1111/medu.12202
Draaijer, S.: Supporting teachers in higher education in designing test items (2016). http://dare.ubvu.vu.nl/handle/1871/54397
Hartog, R., Draaijer, S., Rietveld, L.C.: Practical aspects of task allocation in design and development of digital closed questions in higher education. Pract. Assess. Res. Eval. 13, 2–15 (2008)
ETS: How ETS creates test questions. http://www.ets.org/s/understanding_testing/flash/how_ets_creates_test_questions.html
Osterlind, S.J.: Constructing Test Items: Multiple-Choice, Constructed-Response, Performance, and Other Formats. Kluwer Academic Publisher, Norwell (1998)
Cizek, G.J.: More unintended consequences of high-stakes testing. Educ. Meas. Issues Pract. 20, 19–27 (2001). https://doi.org/10.1111/j.1745-3992.2001.tb00072.x
Haladyna, T.M., Downing, S.M., Rodriguez, M.C.: A review of multiple-choice item-writing guidelines for classroom assessment. Appl. Meas. Educ. 15, 309–333 (2002). https://doi.org/10.1207/S15324818AME1503_5
Gerritsen-van Leeuwenkamp, K.: Het relatieve belang van vijftig kwaliteitskenmerken van toetsing voor studententevredenheid in het hoger beroepsonderwijs [The relative importance of fifty quality indicators for measurement of student satisfaction in higher education] (2012). http://hdl.handle.net/1820/4295
Kano, N., Seraku, N., Takahashi, F., Tsuji, S.: Attractive quality and must-be quality. J. Jpn. Soc. Qual. Control 14, 39–48 (1984)
Bloom, B.S.: Taxonomy of Educational Objectives, the Classification of Educational Goals – Handbook I: Cognitive Domain. McKay, New York (1956)
Haladyna, T.M.: Writing Test Items to Evaluate Higher Order Thinking. Allyn & Bacon, Needham Heights (1997)
Haladyna, T.M.: Developing and Validating Multiple-Choice Test Items. Lawrence Erlbaum Associates, London (2004)
Ebel, R.L.: Essentials of Educational Measurement. Prentice-Hall, Englewood Cliffs (1979)
De Gruijter, D.N.M.: Toetsing en toetsanalyse [Testing and test analysis]. ICLON, Sectie Onderwijsontwikkeling Universiteit Leiden, Leiden (2008)
Olsen, J.B., Bunderson, B.: How to write good test questions [powerpoint presentation] (2004)
Spearman, C.: Correlation calculated from faulty data. Br. J. Psychol. 3(271–295), 1904–1920 (1910). https://doi.org/10.1111/j.2044-8295.1910.tb00206.x
Brown, W.: Some experimental results in the correlation of mental abilities. Br. J. Psychol. 1904–1920(3), 296–322 (1910)
Draaijer, S.: Rule of thumb: 40 questions in a 4-choice multiple-choice test. Why? – Draaijer on Assessment and Testing. https://draaijeronassessmentandtesting.wordpress.com/2014/10/23/rule-of-thumb-40-questions-in-a-4-choice-multiple-choice-test-why/
Gibson, W.M., Weiner, J.A.: Generating random parallel test forms using CTT in a computer-based environment. J. Educ. Meas. 35, 297–310 (1998). https://doi.org/10.1111/j.1745-3984.1998.tb00540.x
Douglas, K.M.: A general method for estimating the classification reliability of complex decisions based on configural combinations of multiple assessment scores (2007)
Eggen, T., Sanders, P.: Psychometrie in de praktijk [Psychometrics in Practice]. Cito Instituut voor Toetsontwikkeling, Arnhem (1993)
Rush, B.R., Rankin, D.C., White, B.J.: The impact of item-writing flaws and item complexity on examination item difficulty and discrimination value. BMC Med. Educ. 16, 250 (2016)
Fitzgerald, C.: Risk management: calculating the bottom line of developing a certification or licensure exam (2005). https://www2.caveon.com/2005/02/08/risk-management-calculating-the-bottom-line-of-developing-a-certification-or-licensure-exam/
Parshall, C.G., Spray, J.A., Kalohn, J.C., Davey, T.: Practical Considerations in Computer-Based Testing. Springer, New York (2002). https://doi.org/10.1007/978-1-4613-0083-0
Downing, S.M.: Construct-irrelevant variance and flawed test questions: do multiple-choice item-writing principles make any difference? Acad. Med. 77, S103–S104 (2002)
Mayenga, C.: Mapping item writing tasks on the item writing ability scale. In: XXXVIIth Annual Conference on Canadian Society of Safety Engineering, Carleton University, Ottawa, Canada (2009)
Rodriguez, M.C.: Three options are optimal for multiple-choice items: a meta-analysis of 80 years of research. Educ. Meas. Issues Pract. 24, 3–13 (2005). https://doi.org/10.1111/j.1745-3992.2005.00006.x
Case, S.M., Holtzman, K., Ripkey, D.R.: Developing an item pool for CBT: a practical comparison of three models of item writing. Acad. Med. 76, S111–S113 (2001)
Draaijer, S., Van Gastel, L., Peeters, V., Frinking, P., Reumer, C.: Flexibilisering van Toetsing. [Flexibility in Testing and Assessment]. Digitale Universiteit, Utrecht (2004)
Downing, S.M.: Threats to the validity of locally developed multiple-choice tests in medical education: construct-irrelevant variance and construct underrepresentation. Adv. Health Sci. Educ. Theory Pract. 7, 235–241 (2002). https://doi.org/10.1023/A:1021112514626
Tarrant, M., Knierim, A., Hayes, S.K., Ware, J.: The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments. Nurse Educ. Pract. 6, 354–363 (2006). https://doi.org/10.1016/j.nepr.2006.07.002
Tarrant, M., Ware, J.: Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursing assessments. Med. Educ. 42, 198–206 (2008)
Downing, S.M.: The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Adv. Health Sci. Educ. 10, 133–143 (2005). https://doi.org/10.1007/s10459-004-4019-5
Wadi, M.M., Abdul Rahim, A.F., Yusoff, M.S.B., Baharuddin, K.A.: The effect of MCQ vetting on students’ examination performance. Educ. Med. J. 6 (2014) https://doi.org/10.5959/eimj.v6i2.216
Hassan, S., Simbak, N., Yussof, H.: Structured vetting procedure of examination questions in medical education in faculty of medicine at universiti sultan zainal abidin. Malays. J. Public Health Med. 16, 29–37 (2016)
Nabil Demaidi, M.: Why is the threshold of Point Biserial correlation (item discrimination) in item analysis 0.2? https://www.researchgate.net/post/Why_is_the_threshold_of_Point_biserial_correlation_item_discrimination_in_item_analysis_02
Crocker, L., Algina, J.: Introduction to Classical and Modern Test Theory. Holt, Rinehart and Winston, Orlando (1986)
Leahy, J.M., Smith, A.: Economics of item development: key cost factors impacting program profitability. Asia ATP (2014)
Wainer, H., Thissen, D.: Combining multiple-choice and constructed-response test scores: toward a marxist theory of test construction. Appl. Meas. Educ. 6, 103 (1993)
Karpicke, J.D., Roediger, H.L.: The critical importance of retrieval for learning. Science 319, 966–968 (2008). https://doi.org/10.1126/science.1152408
Roediger, H.L.I., Agarwal, P.K., McDaniel, M.A., McDermott, K.B.: Test-enhanced learning in the classroom: long-term improvements from quizzing. J. Exp. Psychol. Appl. 17, 382–395 (2011). https://doi.org/10.1037/a0026252
Slusser, S.R., Erickson, R.J.: Group quizzes: an extension of the collaborative learning process. Teach. Sociol. 34, 249–262 (2006). https://doi.org/10.1177/0092055X0603400304
Davey, T., Nering, M.: Controlling item exposure and maintaining item security. In: Mills, C.N., Potenza, M.T., Fremer, J.J., Ward, W.C. (eds.) Computer-Based Testing, Building the Foundation for Future Assessments. Lawrence Erlbaum Associates, Mahwah (2002)
Hattie, J., Timperley, H.: The power of feedback. Rev. Educ. Res. 77, 81–112 (2007). https://doi.org/10.3102/003465430298487
Butler, M., Pyzdrowski, L., Goodykoontz, A., Walker, V.: The effects of feedback on online quizzes. Int. J. Technol. Math. Educ. 15, 131–136 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Draaijer, S. (2019). A Cost–Benefit Analysis for Developing Item Banks in Higher Education. In: Draaijer, S., Joosten-ten Brinke, D., Ras, E. (eds) Technology Enhanced Assessment. TEA 2018. Communications in Computer and Information Science, vol 1014. Springer, Cham. https://doi.org/10.1007/978-3-030-25264-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-25264-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25263-2
Online ISBN: 978-3-030-25264-9
eBook Packages: Computer ScienceComputer Science (R0)