A Cost–Benefit Analysis for Developing Item Banks in Higher Education

Draaijer, Silvester

doi:10.1007/978-3-030-25264-9_11

Silvester Draaijer ORCID: orcid.org/0000-0003-4230-7740¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1014))

Included in the following conference series:

International Conference on Technology Enhanced Assessment

644 Accesses

Abstract

Item banks in higher education can be regarded as important assets to increasing the quality of education and assessment. An item bank allows for the flexible administration of computer-based achievement tests for summative purposes, as well as quizzes for formative purposes. Developing item banks, however, can require quite an investment. A well-worked-out business case can help with convincing stakeholders to start an item bank development project. An important part of such a business case should be the increase in item quality and the estimated reduction in costs, particularly for the collaborative development of an item bank. However, a theoretical underpinning of a business case, incorporating considerations based on classical test theory is lacking in the literature. Therefore, a model is described to make estimations of reductions in misclassifications and per-unit costs. Examples are presented of the likelihood of reducing misclassifications and cost per unit based on findings in the literature. Implications for research and practice are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This is unsurprising because in the case of adaptive testing programmes using item response theory, the main argument for using this complex assessment technique relates largely to being able to develop up to 25% shorter tests for test takers and better controllable reliability. It also relates to preventing item exposure [8]. However, it does not aim to calculate the percentage of misclassifications [5, 8, 9]. It also must be noted that developing multi-stage tests (MSTs) and computer-adaptive tests using IRT methods should be regarded as only feasible for the professional testing industry [10] or in some selected cases for the domain of medicine [11].

References

Anderson, S.B.: The role of the teacher-made test in higher education. New Dir. Community Coll. 1987, 39–44 (1987). https://doi.org/10.1002/cc.36819875907
Article Google Scholar
Jozefowicz, R.F., Koeppen, B.M., Case, S.M., Galbraith, R., Swanson, D., Glew, R.H.: The quality of in-house medical school examinations. Acad. Med. 77, 156–161 (2002)
Article Google Scholar
Jugar, R.R.: An inquiry on the roles of personal test item banking (PTIB) and table of specifications (TOS) in the construction and utilization of classroom tests. Int. J. Educ. Res. 1, 1–8 (2013)
Google Scholar
Vale, C.D.: Computerized item banking. In: Downing, S.M., Haladyna, T.M. (eds.) Handbook of Test Development. Lawrence Earlbaum Associates, Mahwah (2006)
Google Scholar
Lane, S., Raymond, M.R., Haladyna, T.M.: Handbook of Test Development. Routledge, New York (2015)
Book Google Scholar
Draaijer, S., De Werk, J.: Handboek In 5 stappen naar een itembank [Handbook In 5 steps to an item bank]. SURF (2018)
Google Scholar
Downing, S.M., Haladyna, T.M.: Test item development: validity evidence from quality assurance procedures. Appl. Meas. Educ. 10, 61–82 (1997). https://doi.org/10.1207/s15324818ame1001_4
Article Google Scholar
Davey, T.: Practical Considerations in Computer-Based Testing. ETS Research and Development Division (2011)
Google Scholar
Van der Linden, W.J., Glas, C.A.W.: Computerized Adaptive Testing: Theory and Practice. Springer, Dordrecht (2000). https://doi.org/10.1007/0-306-47531-6
Book Google Scholar
Rudner, L.M., Guo, F.: Computer adaptive testing for small scale programs and instructional systems. J. Appl. Test. Technol. 12, 1–12 (2011)
Google Scholar
Baumeister, H., Abberger, B., Haschke, A., Boecker, M., Bengel, J., Wirtz, M.: Development and calibration of an item bank for the assessment of activities of daily living in cardiovascular patients using Rasch analysis. Health Qual. Life Outcomes 11, 133 (2013). https://doi.org/10.1186/1477-7525-11-133
Article Google Scholar
Attali, Y.: Automatic item generation unleashed: an evaluation of a large-scale deployment of item models. In: Penstein Rosé, C., et al. (eds.) AIED 2018. LNCS (LNAI), vol. 10947, pp. 17–29. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93843-1_2
Chapter Google Scholar
Gierl, M.J., Haladyna, T.M.: Automatic Item Generation: Theory and Practice. Routledge, New York (2012)
Book Google Scholar
Gierl, M.J., Lai, H.: Instructional topics in educational measurement (ITEMS) module: using automated processes to generate test items. Educ. Meas. Issues Pract. 32, 36–50 (2013)
Article Google Scholar
Glas, C.A.W., Van der Linden, W.J.: Computerized adaptive testing with item cloning. Appl. Psychol. Meas. 27, 247–261 (2003). https://doi.org/10.1177/0146621603027004001
Article MathSciNet Google Scholar
Gierl, M.J., Lai, H.: Evaluating the quality of medical multiple-choice items created with automated processes. Med. Educ. 47, 726–733 (2013). https://doi.org/10.1111/medu.12202
Article Google Scholar
Draaijer, S.: Supporting teachers in higher education in designing test items (2016). http://dare.ubvu.vu.nl/handle/1871/54397
Hartog, R., Draaijer, S., Rietveld, L.C.: Practical aspects of task allocation in design and development of digital closed questions in higher education. Pract. Assess. Res. Eval. 13, 2–15 (2008)
Google Scholar
ETS: How ETS creates test questions. http://www.ets.org/s/understanding_testing/flash/how_ets_creates_test_questions.html
Osterlind, S.J.: Constructing Test Items: Multiple-Choice, Constructed-Response, Performance, and Other Formats. Kluwer Academic Publisher, Norwell (1998)
Google Scholar
Cizek, G.J.: More unintended consequences of high-stakes testing. Educ. Meas. Issues Pract. 20, 19–27 (2001). https://doi.org/10.1111/j.1745-3992.2001.tb00072.x
Article Google Scholar
Haladyna, T.M., Downing, S.M., Rodriguez, M.C.: A review of multiple-choice item-writing guidelines for classroom assessment. Appl. Meas. Educ. 15, 309–333 (2002). https://doi.org/10.1207/S15324818AME1503_5
Article Google Scholar
Gerritsen-van Leeuwenkamp, K.: Het relatieve belang van vijftig kwaliteitskenmerken van toetsing voor studententevredenheid in het hoger beroepsonderwijs [The relative importance of fifty quality indicators for measurement of student satisfaction in higher education] (2012). http://hdl.handle.net/1820/4295
Kano, N., Seraku, N., Takahashi, F., Tsuji, S.: Attractive quality and must-be quality. J. Jpn. Soc. Qual. Control 14, 39–48 (1984)
Google Scholar
Bloom, B.S.: Taxonomy of Educational Objectives, the Classification of Educational Goals – Handbook I: Cognitive Domain. McKay, New York (1956)
Google Scholar
Haladyna, T.M.: Writing Test Items to Evaluate Higher Order Thinking. Allyn & Bacon, Needham Heights (1997)
Google Scholar
Haladyna, T.M.: Developing and Validating Multiple-Choice Test Items. Lawrence Erlbaum Associates, London (2004)
Google Scholar
Ebel, R.L.: Essentials of Educational Measurement. Prentice-Hall, Englewood Cliffs (1979)
Google Scholar
De Gruijter, D.N.M.: Toetsing en toetsanalyse [Testing and test analysis]. ICLON, Sectie Onderwijsontwikkeling Universiteit Leiden, Leiden (2008)
Google Scholar
Olsen, J.B., Bunderson, B.: How to write good test questions [powerpoint presentation] (2004)
Google Scholar
Spearman, C.: Correlation calculated from faulty data. Br. J. Psychol. 3(271–295), 1904–1920 (1910). https://doi.org/10.1111/j.2044-8295.1910.tb00206.x
Article Google Scholar
Brown, W.: Some experimental results in the correlation of mental abilities. Br. J. Psychol. 1904–1920(3), 296–322 (1910)
Google Scholar
Draaijer, S.: Rule of thumb: 40 questions in a 4-choice multiple-choice test. Why? – Draaijer on Assessment and Testing. https://draaijeronassessmentandtesting.wordpress.com/2014/10/23/rule-of-thumb-40-questions-in-a-4-choice-multiple-choice-test-why/
Gibson, W.M., Weiner, J.A.: Generating random parallel test forms using CTT in a computer-based environment. J. Educ. Meas. 35, 297–310 (1998). https://doi.org/10.1111/j.1745-3984.1998.tb00540.x
Article Google Scholar
Douglas, K.M.: A general method for estimating the classification reliability of complex decisions based on configural combinations of multiple assessment scores (2007)
Google Scholar
Eggen, T., Sanders, P.: Psychometrie in de praktijk [Psychometrics in Practice]. Cito Instituut voor Toetsontwikkeling, Arnhem (1993)
Google Scholar
Rush, B.R., Rankin, D.C., White, B.J.: The impact of item-writing flaws and item complexity on examination item difficulty and discrimination value. BMC Med. Educ. 16, 250 (2016)
Article Google Scholar
Fitzgerald, C.: Risk management: calculating the bottom line of developing a certification or licensure exam (2005). https://www2.caveon.com/2005/02/08/risk-management-calculating-the-bottom-line-of-developing-a-certification-or-licensure-exam/
Parshall, C.G., Spray, J.A., Kalohn, J.C., Davey, T.: Practical Considerations in Computer-Based Testing. Springer, New York (2002). https://doi.org/10.1007/978-1-4613-0083-0
Book MATH Google Scholar
Downing, S.M.: Construct-irrelevant variance and flawed test questions: do multiple-choice item-writing principles make any difference? Acad. Med. 77, S103–S104 (2002)
Article Google Scholar
Mayenga, C.: Mapping item writing tasks on the item writing ability scale. In: XXXVIIth Annual Conference on Canadian Society of Safety Engineering, Carleton University, Ottawa, Canada (2009)
Google Scholar
Rodriguez, M.C.: Three options are optimal for multiple-choice items: a meta-analysis of 80 years of research. Educ. Meas. Issues Pract. 24, 3–13 (2005). https://doi.org/10.1111/j.1745-3992.2005.00006.x
Article Google Scholar
Case, S.M., Holtzman, K., Ripkey, D.R.: Developing an item pool for CBT: a practical comparison of three models of item writing. Acad. Med. 76, S111–S113 (2001)
Article Google Scholar
Draaijer, S., Van Gastel, L., Peeters, V., Frinking, P., Reumer, C.: Flexibilisering van Toetsing. [Flexibility in Testing and Assessment]. Digitale Universiteit, Utrecht (2004)
Google Scholar
Downing, S.M.: Threats to the validity of locally developed multiple-choice tests in medical education: construct-irrelevant variance and construct underrepresentation. Adv. Health Sci. Educ. Theory Pract. 7, 235–241 (2002). https://doi.org/10.1023/A:1021112514626
Article Google Scholar
Tarrant, M., Knierim, A., Hayes, S.K., Ware, J.: The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments. Nurse Educ. Pract. 6, 354–363 (2006). https://doi.org/10.1016/j.nepr.2006.07.002
Article Google Scholar
Tarrant, M., Ware, J.: Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursing assessments. Med. Educ. 42, 198–206 (2008)
Article Google Scholar
Downing, S.M.: The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Adv. Health Sci. Educ. 10, 133–143 (2005). https://doi.org/10.1007/s10459-004-4019-5
Article Google Scholar
Wadi, M.M., Abdul Rahim, A.F., Yusoff, M.S.B., Baharuddin, K.A.: The effect of MCQ vetting on students’ examination performance. Educ. Med. J. 6 (2014) https://doi.org/10.5959/eimj.v6i2.216
Hassan, S., Simbak, N., Yussof, H.: Structured vetting procedure of examination questions in medical education in faculty of medicine at universiti sultan zainal abidin. Malays. J. Public Health Med. 16, 29–37 (2016)
Google Scholar
Nabil Demaidi, M.: Why is the threshold of Point Biserial correlation (item discrimination) in item analysis 0.2? https://www.researchgate.net/post/Why_is_the_threshold_of_Point_biserial_correlation_item_discrimination_in_item_analysis_02
Crocker, L., Algina, J.: Introduction to Classical and Modern Test Theory. Holt, Rinehart and Winston, Orlando (1986)
Google Scholar
Leahy, J.M., Smith, A.: Economics of item development: key cost factors impacting program profitability. Asia ATP (2014)
Google Scholar
Wainer, H., Thissen, D.: Combining multiple-choice and constructed-response test scores: toward a marxist theory of test construction. Appl. Meas. Educ. 6, 103 (1993)
Article Google Scholar
Karpicke, J.D., Roediger, H.L.: The critical importance of retrieval for learning. Science 319, 966–968 (2008). https://doi.org/10.1126/science.1152408
Article Google Scholar
Roediger, H.L.I., Agarwal, P.K., McDaniel, M.A., McDermott, K.B.: Test-enhanced learning in the classroom: long-term improvements from quizzing. J. Exp. Psychol. Appl. 17, 382–395 (2011). https://doi.org/10.1037/a0026252
Article Google Scholar
Slusser, S.R., Erickson, R.J.: Group quizzes: an extension of the collaborative learning process. Teach. Sociol. 34, 249–262 (2006). https://doi.org/10.1177/0092055X0603400304
Article Google Scholar
Davey, T., Nering, M.: Controlling item exposure and maintaining item security. In: Mills, C.N., Potenza, M.T., Fremer, J.J., Ward, W.C. (eds.) Computer-Based Testing, Building the Foundation for Future Assessments. Lawrence Erlbaum Associates, Mahwah (2002)
Google Scholar
Hattie, J., Timperley, H.: The power of feedback. Rev. Educ. Res. 77, 81–112 (2007). https://doi.org/10.3102/003465430298487
Article Google Scholar
Butler, M., Pyzdrowski, L., Goodykoontz, A., Walker, V.: The effects of feedback on online quizzes. Int. J. Technol. Math. Educ. 15, 131–136 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Behavioural and Movement Sciences, Department of Research and Theory in Education, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV, Amsterdam, The Netherlands
Silvester Draaijer

Authors

Silvester Draaijer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Silvester Draaijer .

Editor information

Editors and Affiliations

Faculty of Behavioural and Movement Sciences, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Silvester Draaijer
Welten Instituut, Open University of the Netherlands, Heerlen, The Netherlands
Desirée Joosten-ten Brinke
Luxembourg Institute of Science and Technology, Esch-sur-Alzette, Luxembourg
Eric Ras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Draaijer, S. (2019). A Cost–Benefit Analysis for Developing Item Banks in Higher Education. In: Draaijer, S., Joosten-ten Brinke, D., Ras, E. (eds) Technology Enhanced Assessment. TEA 2018. Communications in Computer and Information Science, vol 1014. Springer, Cham. https://doi.org/10.1007/978-3-030-25264-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-25264-9_11
Published: 13 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25263-2
Online ISBN: 978-3-030-25264-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics