Abstract
Multiple-choice (MC) tests have been criticized for allowing guessing and the failure to credit partial knowledge, and alternative scoring methods and response formats (Ben-Simon et al., Appl Psychol Meas 21:65–88, 1997) have been proposed to address this problem. Modern test theory addresses these issues by using binary item response models (e.g., 3PL) with guessing parameters, or with polytomous IRT models. We propose an option-based partial credit IRT model and a new scoring rule based on a weighted Hamming distance between the option key and the option response vector. The test taker (TT)’s estimated ability is based on information from both correct options and distracters. These modifications reduce the TT’s ability to guess and credit the TT’s partial knowledge. The new model can be tailored to different formats, and some popular IRT models, such as the 2PL and Bock’s nominal model, are special cases of the proposed model. Markov Chain Monte Carlo (MCMC) analysis was used to estimate the model parameters and it provides satisfactory estimates of the model parameters. Simulation studies show that the weighted Hamming distance scores have the highest correlation with TTs’ true abilities, and their distribution is also less skewed than those of the other scores considered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Based on the items from “Practice Book for the Paper-based GRE revised General Test,” 26 % of the verbal items and 10 % of the quantitative items are of this type.
- 2.
In the text completion items, it is probably more justified to use grouped number correct scoring than it is for MC items with multiple correct options, since the choice for each blank depends on the other choices.
- 3.
The GRE revised General Test has such items for which TTs are asked to choose all the options that apply.
- 4.
Please note that the model is by no means restricted only to the scoring rules listed in the table.
- 5.
Interested readers may email Yuanchao Emily Bo (ybo@fordham.edu) for the R and WinBugs code.
References
Andersen EB (1977) Sufficient statistics and latent trait models. Psychometrika 42:69–81
Andrich D (1988) Rasch models for measurement. Sage Publications, Beverly Hills
Bechger TM, Maris G, Verstralen HHFM, Verhelst ND (2005) The Nedelsky model for multiple choice items. In: van der Ark LA, Croon MA, Sijtsma K (eds) New developments in categorical data analysis for the social and behavioral sciences. Erlbaum, Mahwah, pp 187–206
Ben-Simon A, Budescu DV, Nevo B (1997) A comparative study of measures of partial knowledge in multiple-choice tests. Appl Psychol Meas 21:65–88
Bereby-Meyer Y, Meyer J, Budescu DV (2003) Decision making under internal uncertainty: the case of multiple-choice tests with different scoring rules. Acta Psychol 112:207–220
Birnbaum A (1968) Some latent trait models and their use in inferring an examinee’s ability. In: Lord FM, Novick MR (eds) Statistical theories of mental test scores. Addison-Wesley, Reading
Bock RD (1972) Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika 37:29–51
Budescu DV, Bar-Hillel M (1993) To guess or not to guess: a decision theoretic view of formula scoring. J Educ Meas 30:227–291
Budescu DV, Bo Y (in press) Analyzing test-taking behavior: decision theory meets psychometric theory. Psychometrika
Coombs CH, Milholland JE, Womer FB (1956) The assessment of partial knowledge. Educ Psychol Meas 16:13–37
R Development Core Team (2013) R: a language and environment for statistical computing [computer software]. R Foundation for Statistical Computing, Vienna. Retrieved from http://www.R-project.org/
Dressel PL, Schmidt J (1953) Some modifications of the multiple choice item. Educ Psychol Meas 13:574–595
Echternacht GJ (1976) Reliability and validity of option weighting schemes. Educ Psychol Meas 36:301–309
Frary RB (1989) Partial-credit scoring methods for multiple-choice tests. Appl Meas Educ 2:79–96
Gibbons JD, Olkin I, Sobel M (1977) Selecting and ordering populations: a new statistical methodology. Wiley, New York
Gulliksen H (1950) Theory of mental tests. Wiley, New York
Haladyna TM, (1988) Empirically based polychromous scoring of multiple choice test items: A review. Paper presented at the annual meeting of the American Educational Research Association, New Orleans
Hambleton RK, Roberts DM, Traub RE (1970) A comparison of the reliability and validity of two methods for assessing partial knowledge on a multiple-choice test. J Educ Meas 7:75–82
Hamming RW (1950) Error detecting and error correcting codes. Bell Syst Tech J 29:147–160
Hansen R (1971) The influence of variables other than knowledge on probabilistic tests. J Educ Meas 8:9–14
Holzinger KJ (1924) On scoring multiple response tests. J Educ Psychol 15:445–447
Hutchinson TP (1982) Some theories of performance in multiple-choice tests, and their implications for variants of the task. Br J Math Stat Psychol 35:71–89
Jacobs SS (1971) Correlates of unwarranted confidence in responses to objective test items. J Educ Meas 8:15–19
Jaradat D, Tollefson N (1988) The impact of alternative scoring procedures for multiple-choice items on test reliability, validity and grading. Educ Psychol Meas 48:627–635
Kahneman D, Tversky A (1979) Prospect theory: an analysis of decisions under risk. Econometrica 47:313–327
Lunn DJ, Thomas A, Best N, Spiegelhalter D (2000) WinBUGS – a Bayesian modeling framework: concepts, structure, and extensibility. Stat Comput 10:325–337
Masters GN (1982) A Rasch model for partial credit scoring. Psychometrika 47:149–174
Michael JC (1968) The reliability of a multiple choice examination under various test-taking instructions. J Educ Meas 5:307–314
Muraki E (1992) A generalized partial credit model: application of an EM algorithm. Appl Psychol Meas 16:159–176
Pugh RC, Brunza JJ (1975) Effects of a confidence weighted scoring system on measures of test reliability and validity. Educ Psychol Meas 35:73–78
Rippey RM (1970) A comparison of five different scoring functions for confidence tests. J Educ Meas 7:165–170
Ruch GM, Stoddard GD (1925) Comparative reliabilities of objective examinations. J Educ Psychol 16:89–103
Samejima F (1969) Estimation of ability using a response pattern of graded scores. Psychometrika Monograph, No. 18
Samejima F (1972) A general model for free-response data. Psychometrika Monograph, No, 18.
Samejima F (1979) A new family of models for the multiple choice item (Research Report No. 79-4). University of Tennessee, Department of Psychology, Knoxville
San Martin E, del Pino G, de Boeck P (2006) IRT models for ability-based guessing. Appl Psychol Meas 30:183–203
Smith RM (1987) Assessing partial knowledge in vocabulary. J Educ Meas 24:217–231
Stanley JC, Wang MD (1970) Weighting test items and test item options, an overview of the analytical and empirical literature. Educ Psychol Meas 30:21–35
Swineford F (1938) Measurement of a personality trait. J Educ Psychol 29:295–300
Swineford F (1941) Analysis of a personality trait. J Educ Psychol 32:348–444
Sykes RC, Hou L (2003) Weighting constructed-response items in IRT-based exams. Appl Meas Educ 16:257–275
Thissen D, Steinberg L (1984) A response model for multiple choice items. Psychometrika 49:501–519
Thurstone LL (1919) A method for scoring tests. Psychol Bull 16:235–240
Tversky A, Kahneman D (1992) Advances in prospect theory: cumulative representation of uncertainty. J Risk Uncertainty 5:297–323
Wang MW, Stanley JC (1970) Differential weighting: a review of methods and empirical studies. Rev Educ Res 40:663–705
Yaniv I, Schul Y (1997) Elimination and inclusion procedures in judgment. J Behav Decis Mak 10:211–220
Yaniv I, Schul Y (2000) Acceptance and elimination procedure in choice: noncomplementarity and the role of implied status quo. Organ Behav Hum Decis Process 82:293–313
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1
We can use the following form to derive the Fisher information for an item
The logarithm of the likelihood for the model given in Eq. (22) is
Note that
So
The derivative in the last term may be simplified as follows:
Thus, we may write the derivative of the log likelihood for an item as
Appendix 2
Start with the expression for the derivative of the log likelihood,
and notice that the second term is actually the expected value of the quantity \( {\displaystyle \sum_{h=1}^K}\left({r}_h{a}_h\right) \). Specially, if we define
we may write
This allows us to rewrite the derivative of the log likelihood as
The item information function then becomes
Since the right-hand side of this expression is the conditional variance of s(r), we may write
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bo, Y.(., Lewis, C., Budescu, D.V. (2015). An Option-Based Partial Credit Item Response Model. In: Millsap, R., Bolt, D., van der Ark, L., Wang, WC. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 89. Springer, Cham. https://doi.org/10.1007/978-3-319-07503-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-07503-7_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07502-0
Online ISBN: 978-3-319-07503-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)