Abstract
All formulas for the agreement indices based on the research situations distinguished in Chap. 3 are presented. The indices belong to a family of indices. The formulas are not ones for all imaginable research situations. Only the ones are presented that are regularly found in existing empirical research. The chapter also lists all known indices, and it is indicated why these do or do not fulfil the requirements mentioned in Chap. 3.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A. Agresti, Modelling patterns of agreement and disagreement. Stat. Methods Med. Res. 1(2), 201–218 (1992)
M.R. Anderberg, Cluster Analysis for Applications (Academic Press, New York, 1973)
P. Armitage, L.M. Blendis, H.C. Smyllie, The measurement of observer disagreement in the recording of signs. J. R. Stat. Soc. (A) 129(1), 98–109 (1966)
W. Barlow, M.Y. Lai, S.P. Azen, A comparison of methods for calculating a stratified kappa. Stat. Med. 10(9), 1465–1472 (1991)
E.M. Bennett, R.L. Blomquist, A.C. Goldstein, Response stability in limited response questioning. Public Opin. Q. 18(2), 218–223 (1954)
Y.M.M. Bishop, S.E. Fienberg, P.W. Holland, Discrete Multivariate Analysis. Theory and Practice (The MIT Press, Cambridge, 1975)
R.L. Brennan, R.J. Light, Measuring agreement when two observers classify people into categories not defined in advance. Br. J. Math. Stat. Psychol. 27(2), 154–163 (1974)
T. Byrt, J. Bishop, B. Carlin, Bias, prevalence and kappa. J. Clin. Epidemiol. 46(5), 423–429 (1993)
D.S. Cartwright, A rapid non parametric estimate of multi judge reliability. Psychometrika 21(1), 17–29 (1956)
D.V. Cicchetti, A new measure of agreement between rank ordered variables, in Proceedings of the 80th Annual Convention, vol. 7. American Statistical Association (1972), pp. 17–18
D.V. Cicchetti, Testing the normal approximation and minimal sample requirements of weighted kappa when the number of categories is large. Appl. Psychol. Meas. 5(1), 101–104 (1981)
D.V. Cicchetti, J.L. Fleiss, Comparing the null distributions of weighted kappa and the C ordinal statistic. Appl. Psychol. Meas. 1(2), 195–201 (1977)
D.V. Cicchetti, C. Lee, A.F. Fontana, B.N. Dows, A computer program for assessing specific category rater agreement for qualitative data. Educ. Psychol. Measur. 38(3), 805–813 (1978)
P.G. Clement, A formula for computing inter-observer agreement. Psychol. Rep. 39(1), 257–258 (1976)
J. Cohen, A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
A.J. Conger, Integration and generalization of kappas for multiple raters. Psychol. Bull. 88(2), 322–328 (1980)
R.T. Craig, Generalization of Scott’s index of intercoder agreement. Public Opinion Quarterly 45(2), 260–264 (1981)
M. Davies, J.L. Fleiss, Measuring agreement for multinomial data. Biometrics 38(4), 1047–1051 (1982)
L.R. Dice, Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
B. Efron, Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Biometrika 68(3), 589–599 (1981)
R.C. Elston, S.R. Schroeder, J. Rojahn, Measures of observer agreement when binomial data are collected in free operant situations. J. Behav. Assess. 4(4), 299–310 (1982)
A.R. Feinstein, D.V. Cicchetti, High agreement but low kappa, I: the problems of two paradoxes. J. Clin. Epidemiol. 43(6), 543–549 (1990)
N.A. Flanders, Interaction Analysis: Theory, Research and Applications (Addison-Wesley, Reading, 1967), pp. 161–166
J.L. Fleiss, Estimating the accuracy of dichotomous judgements. Psychometrika 30(4), 469–479 (1965)
J.L. Fleiss, Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)
J.L. Fleiss, D.V. Cicchetti, Inference about weighted kappa in the non-null case. Appl. Psychol. Meas. 2(1), 113–117 (1978)
J.L. Fleiss, J. Cohen, B.S. Everitt, Large sample standard errors of kappa and weighted kappa. Psychol. Bull. 72(5), 323–327 (1969)
J.L. Fleiss, J. Cuzick, The reliability of dichotomous judgments: unequal number of judges per subject. Appl. Psychol. Meas. 3(4), 537–542 (1979)
J.L. Fleiss, B.S. Everitt, Comparing the marginal totals of square contingency tables. Br. J. Math. Stat. Psychol. 24(1), 117–123 (1971)
J.L. Fleiss, J.C.M. Nee, J.R. Landis, Large sample variance of kappa in the case of different sets of raters. Psychol. Bull. 86(5), 974–977 (1979)
J.L. Fleiss, R.L. Spitzer, J. Endicott, J. Cohen, ‘Quantification of agreement in multiple psychiatric diagnosis. Arch. Gen. Psychiatry 26(2), 168–171 (1972)
J.L. Fleiss, Statistical Methods for Rates and Proportions (Wiley, New York, 1981)
J. Galtung, Theory and Methods of Social Research (Allen & Unwin, London, 1967)
M. Gamer, J. Lemon, I. Fellows, P. Sing, P, in Various coefficients of interrater reliability and agreement. (Version 0.83) [software] (2010). Available from http://CRAN.R-project.org/package=irr
C.S. Garrett, Modification of the Scott coefficient as an observer agreement estimate for marginal-form observation scale data. J Exp Educ 43(1), 21–26 (1975)
L.A. Goodman, W.H. Kruskal, Measures of association for cross classifications. J. Am. Stat. Assoc. 49(268), 732–764 (1954)
W.M. Grove, N.C. Andreasen, P. McDonald-Scott, M.B. Keller, R.W. Shapiro, Reliability studies of psychiatric diagnosis. Theory and practice. Arch. Gen. Psychiatry 38(4), 408–413 (1981)
K.L. Gwet, Computing inter-rater reliability and its variance in the presence of high agreement. Br. J. Math. Stat. Psychol. 61(1), 29–48 (2008a)
K.L. Gwet, Variance estimation of nominal-scale inter-rater reliability with random selection of raters. Psychometrika 73(3), 407–430 (2008b)
F.C. Harris, B.B. Lahey, A method for combining occurrence and nonoccurrence interobserver agreement scores. J. Appl. Behav. Anal. 11(4), 523–527 (1978)
R.P. Hawkins, V.A. Dotson, Reliability scores that delude: an Alice in Wonderland trip through the misleading characteristics of interobserver agreement scores in interval recording, in Behavior Analysis. Areas of Research and Application, ed. by E. Ramp, G. Semb (Prentice Hall, Englewood Cliffs, 1975), pp. 359–376
A.F. Hayes, K. Krippendorff, Answering the call for a standard reliability measure for coding data. Commun. Methods Measures 1(1), 77–89 (2007)
A. Hervé, L.J. Williams, Jackknife, in Encyclopedia of Research Design, ed. by N. Salkind (Sage, Thousand Oaks, 2010)
J.W. Holley, J.P. Guilford, A note on the G-index of agreement. Educ. Psychol. Measur. 24(4), 749–753 (1964)
O.R. Holsti, Content Analysis for the Social Sciences and Humanities (Addison Wesley, London, 1969)
B.L. Hopkins, J.A. Hermann, Evaluating interobserver reliability of interval data. J. Appl. Behav. Anal. 10(1), 121–126 (1977)
A.E. House, B.J. House, M.B. Campbell, Measures of interobserver agreement: calculation formulas and distribution effects. J. Behav. Assess. 3(1), 37–57 (1981)
L.J. Hubert, Kappa revisited. Psychol. Bull. 84(2), 289–297 (1977)
C.L. Janes, Extension of the random error coefficient of agreement to N × N tables. Br. J. Psych. 134(6), 617–619 (1979)
S. Janson, J. Vegelius, On the generalization of the G-index and the phi coefficient to nominal scales. Multivar. Behav. Res. 14(2), 255–269 (1979)
S. Janson, J. Vegelius, The J-index as a measure of nominal scale response agreement. Appl. Psychol. Measure. 6(1), 111–121 (1982)
R.N. Kent, S.L. Foster, Direct observational procedures: methodological issues in naturalistic settings, in Handbook of Behavioral Assessment, ed. by A.R. Ciminero, K.S. Calhoun, H.E. Adams (Wiley, New York, 1977), pp. 279–328
H.C. Kraemer, Extensions of the kappa coefficient. Biometrics 36(2), 207–216 (1980)
L.L. Kupper, K.B. Hafner, On assessing interrater agreement for multiple attribute responses. Biometrics 45(3), 957–967 (1989)
J.R. Landis, G.G. Koch, A review of statistical methods in the analysis of data arising from observer reliability studies. Part 2. Stat. Neerl. 29(2), 151–161 (1975)
J.R. Landis, G.G. Koch, The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)
J. Loevinger, The technique of homogeneous test compared with some aspects of ‘scale analysis’ and ‘factor analysis’. Psychol. Bull. 45(6), 507–530 (1948)
M. Lombard, J. Snyder-Duch, C.C. Bracken, Content analysis in mass communication: assessment and reporting of intercoder reliability. Human Commun. Res. 28(4), 587–604 (2002)
A.E. Maxwell, Comparing the classifications of subjects by two independent judges. Br. J. Psych. 116(535), 651–655 (1970)
A.E. Maxwell, Coefficients of agreement between observers and their interpretation. Br. J. Psych. 130(1), 79–83 (1977)
A.E. Maxwell, A.E.G. Pilliner, Deriving coefficients of reliability and agreement for ratings. Br. J. Math. Stat. Psychol. 21(1), 105–116 (1968)
R.J. Mokken, A Theory and Procedure of Scale Analysis: With Applications in Political Research (Mouton, The Hague, 1971)
A.C. Montgomery, K.S. Crittenden, Improving coding reliability for open-ended questions. Public Opin. Q. 41(2), 235–243 (1977)
W.D. Perreault, L.E. Leigh, Reliability of nominal data based on qualitative judgments. J. Mark. Res. 26(2), 135–148 (1989)
R. Popping, Traces of agreement. On the dot-product as a coefficient of agreement. Qual. Quant. 17(1), 1–18 (1983)
R. Popping, Overeenstemmingsmaten voor nominale data [Measures of agreement for nominal data]. Unpublished PhD, University of Groningen, Groningen (1983b)
R. Popping, Traces of agreement: extensions of the D2-index. Qual. Quant. 19(4), 383–388 (1985)
R. Popping, On agreement indices for nominal data, in Sociometric Research, vol. I, ed. by W.E. Saris, I.N. Gallhofer (McMillan, London, 1988), pp. 90–105
R. Popping, In search for one set of categories. Qual. Quant. 25(1), 147–155 (1992)
R. Popping, Ag09. A computer program for interrater agreement for judgments. Social Sci. Comput. Rev. 28(3), 391–396 (2010)
D.W. Rae, M. Taylor, The Analysis of Political Cleavages (Yale University Press, New Haven, 1970), pp. 115–145
E. Rogot, I.D. Goldberg, A proposed index for measuring agreement in test-retest studies. J. Chronic Diseases 19(9), 991–1006 (1966)
D.C. Ross, Testing patterned hypothesis in multi-way contingency tables using weighted kappa and weighted chi square. Educ. Psychol. Measur. 37(2), 291–307 (1977)
H.J.A. Schouten, Measuring pairwise agreement among many observers. Biometr. J. 22(6), 497–504 (1980)
H.J.A. Schouten, Measuring pairwise agreement among many observers. II. Some improvements and additions. Biometr. J. 24(5), 431–435 (1982a)
H.J.A. Schouten, Measuring pairwise interobserver agreement when all subjects are judged by the same observers. Stat. Neerl. 36(2), 45–61 (1982b)
H.J.A. Schouten, Nominal scale agreement among observers. Psychometrika 51(3), 453–466 (1986)
P.E. Shrout, R.L. Spitzer, J.L. Fleiss, Quantification of agreement in psychiatric diagnosis revisited. Arch. Gen. Psychiatry 44(2), 172–177 (1987)
J. Spanjer, B. Krol, R. Popping, J.W. Groothoff, S. Brouwer, Disability assessment interview: the role of concrete and detailed information on functioning besides medical history taking. J. Rehabil. Med. 41(4), 267–272 (2009)
A. Stuart, A test of homogeneity of marginal distributions in a two-way classification. Biometrika 42(3/4), 412–416 (1955)
J.S. Uebersax, A generalized kappa coefficient. Educ. Psychol. Measu. 42(1), 181–183 (1982)
J.S. Uebersax, Diversity of decision-making models and the measurement of interrater agreement. Psychol. Bull. 101(1), 140–146 (1987)
M.J. Warrens, Cohen’s kappa can always be increased and decreased by combining categories. Stat. Method. 7(6), 673–677 (2010)
G.W. Williams, Comparing the joint agreement of several raters with another rater. Biometrics 32(3), 619–627 (1976)
J.L. Woodward, R. Franzen, A study on coding reliability. Public Opin. Q. 12(2), 253–257 (1948)
R. Zwick, Another look at interrater agreement. Psychol. Bull. 103(3), 374–378 (1988)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Popping, R. (2019). Indices. In: Introduction to Interrater Agreement for Nominal Data. Springer, Cham. https://doi.org/10.1007/978-3-030-11671-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-11671-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11670-5
Online ISBN: 978-3-030-11671-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)