Skip to main content
  • 457 Accesses

Abstract

All formulas for the agreement indices based on the research situations distinguished in Chap. 3 are presented. The indices belong to a family of indices. The formulas are not ones for all imaginable research situations. Only the ones are presented that are regularly found in existing empirical research. The chapter also lists all known indices, and it is indicated why these do or do not fulfil the requirements mentioned in Chap. 3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • A. Agresti, Modelling patterns of agreement and disagreement. Stat. Methods Med. Res. 1(2), 201–218 (1992)

    Article  Google Scholar 

  • M.R. Anderberg, Cluster Analysis for Applications (Academic Press, New York, 1973)

    MATH  Google Scholar 

  • P. Armitage, L.M. Blendis, H.C. Smyllie, The measurement of observer disagreement in the recording of signs. J. R. Stat. Soc. (A) 129(1), 98–109 (1966)

    MathSciNet  Google Scholar 

  • W. Barlow, M.Y. Lai, S.P. Azen, A comparison of methods for calculating a stratified kappa. Stat. Med. 10(9), 1465–1472 (1991)

    Article  Google Scholar 

  • E.M. Bennett, R.L. Blomquist, A.C. Goldstein, Response stability in limited response questioning. Public Opin. Q. 18(2), 218–223 (1954)

    Article  Google Scholar 

  • Y.M.M. Bishop, S.E. Fienberg, P.W. Holland, Discrete Multivariate Analysis. Theory and Practice (The MIT Press, Cambridge, 1975)

    MATH  Google Scholar 

  • R.L. Brennan, R.J. Light, Measuring agreement when two observers classify people into categories not defined in advance. Br. J. Math. Stat. Psychol. 27(2), 154–163 (1974)

    Article  Google Scholar 

  • T. Byrt, J. Bishop, B. Carlin, Bias, prevalence and kappa. J. Clin. Epidemiol. 46(5), 423–429 (1993)

    Article  Google Scholar 

  • D.S. Cartwright, A rapid non parametric estimate of multi judge reliability. Psychometrika 21(1), 17–29 (1956)

    Article  MATH  Google Scholar 

  • D.V. Cicchetti, A new measure of agreement between rank ordered variables, in Proceedings of the 80th Annual Convention, vol. 7. American Statistical Association (1972), pp. 17–18

    Google Scholar 

  • D.V. Cicchetti, Testing the normal approximation and minimal sample requirements of weighted kappa when the number of categories is large. Appl. Psychol. Meas. 5(1), 101–104 (1981)

    Article  Google Scholar 

  • D.V. Cicchetti, J.L. Fleiss, Comparing the null distributions of weighted kappa and the C ordinal statistic. Appl. Psychol. Meas. 1(2), 195–201 (1977)

    Article  Google Scholar 

  • D.V. Cicchetti, C. Lee, A.F. Fontana, B.N. Dows, A computer program for assessing specific category rater agreement for qualitative data. Educ. Psychol. Measur. 38(3), 805–813 (1978)

    Article  Google Scholar 

  • P.G. Clement, A formula for computing inter-observer agreement. Psychol. Rep. 39(1), 257–258 (1976)

    Article  Google Scholar 

  • J. Cohen, A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)

    Article  Google Scholar 

  • A.J. Conger, Integration and generalization of kappas for multiple raters. Psychol. Bull. 88(2), 322–328 (1980)

    Article  Google Scholar 

  • R.T. Craig, Generalization of Scott’s index of intercoder agreement. Public Opinion Quarterly 45(2), 260–264 (1981)

    Article  Google Scholar 

  • M. Davies, J.L. Fleiss, Measuring agreement for multinomial data. Biometrics 38(4), 1047–1051 (1982)

    Article  MATH  Google Scholar 

  • L.R. Dice, Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)

    Article  Google Scholar 

  • B. Efron, Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Biometrika 68(3), 589–599 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  • R.C. Elston, S.R. Schroeder, J. Rojahn, Measures of observer agreement when binomial data are collected in free operant situations. J. Behav. Assess. 4(4), 299–310 (1982)

    Article  Google Scholar 

  • A.R. Feinstein, D.V. Cicchetti, High agreement but low kappa, I: the problems of two paradoxes. J. Clin. Epidemiol. 43(6), 543–549 (1990)

    Article  Google Scholar 

  • N.A. Flanders, Interaction Analysis: Theory, Research and Applications (Addison-Wesley, Reading, 1967), pp. 161–166

    Google Scholar 

  • J.L. Fleiss, Estimating the accuracy of dichotomous judgements. Psychometrika 30(4), 469–479 (1965)

    Article  MATH  Google Scholar 

  • J.L. Fleiss, Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)

    Article  Google Scholar 

  • J.L. Fleiss, D.V. Cicchetti, Inference about weighted kappa in the non-null case. Appl. Psychol. Meas. 2(1), 113–117 (1978)

    Article  Google Scholar 

  • J.L. Fleiss, J. Cohen, B.S. Everitt, Large sample standard errors of kappa and weighted kappa. Psychol. Bull. 72(5), 323–327 (1969)

    Article  Google Scholar 

  • J.L. Fleiss, J. Cuzick, The reliability of dichotomous judgments: unequal number of judges per subject. Appl. Psychol. Meas. 3(4), 537–542 (1979)

    Article  Google Scholar 

  • J.L. Fleiss, B.S. Everitt, Comparing the marginal totals of square contingency tables. Br. J. Math. Stat. Psychol. 24(1), 117–123 (1971)

    Article  MATH  Google Scholar 

  • J.L. Fleiss, J.C.M. Nee, J.R. Landis, Large sample variance of kappa in the case of different sets of raters. Psychol. Bull. 86(5), 974–977 (1979)

    Article  Google Scholar 

  • J.L. Fleiss, R.L. Spitzer, J. Endicott, J. Cohen, ‘Quantification of agreement in multiple psychiatric diagnosis. Arch. Gen. Psychiatry 26(2), 168–171 (1972)

    Article  Google Scholar 

  • J.L. Fleiss, Statistical Methods for Rates and Proportions (Wiley, New York, 1981)

    Google Scholar 

  • J. Galtung, Theory and Methods of Social Research (Allen & Unwin, London, 1967)

    Google Scholar 

  • M. Gamer, J. Lemon, I. Fellows, P. Sing, P, in Various coefficients of interrater reliability and agreement. (Version 0.83) [software] (2010). Available from http://CRAN.R-project.org/package=irr

  • C.S. Garrett, Modification of the Scott coefficient as an observer agreement estimate for marginal-form observation scale data. J Exp Educ 43(1), 21–26 (1975)

    Article  Google Scholar 

  • L.A. Goodman, W.H. Kruskal, Measures of association for cross classifications. J. Am. Stat. Assoc. 49(268), 732–764 (1954)

    MATH  Google Scholar 

  • W.M. Grove, N.C. Andreasen, P. McDonald-Scott, M.B. Keller, R.W. Shapiro, Reliability studies of psychiatric diagnosis. Theory and practice. Arch. Gen. Psychiatry 38(4), 408–413 (1981)

    Article  Google Scholar 

  • K.L. Gwet, Computing inter-rater reliability and its variance in the presence of high agreement. Br. J. Math. Stat. Psychol. 61(1), 29–48 (2008a)

    Article  MathSciNet  Google Scholar 

  • K.L. Gwet, Variance estimation of nominal-scale inter-rater reliability with random selection of raters. Psychometrika 73(3), 407–430 (2008b)

    Article  MathSciNet  MATH  Google Scholar 

  • F.C. Harris, B.B. Lahey, A method for combining occurrence and nonoccurrence interobserver agreement scores. J. Appl. Behav. Anal. 11(4), 523–527 (1978)

    Article  Google Scholar 

  • R.P. Hawkins, V.A. Dotson, Reliability scores that delude: an Alice in Wonderland trip through the misleading characteristics of interobserver agreement scores in interval recording, in Behavior Analysis. Areas of Research and Application, ed. by E. Ramp, G. Semb (Prentice Hall, Englewood Cliffs, 1975), pp. 359–376

    Google Scholar 

  • A.F. Hayes, K. Krippendorff, Answering the call for a standard reliability measure for coding data. Commun. Methods Measures 1(1), 77–89 (2007)

    Article  Google Scholar 

  • A. Hervé, L.J. Williams, Jackknife, in Encyclopedia of Research Design, ed. by N. Salkind (Sage, Thousand Oaks, 2010)

    Google Scholar 

  • J.W. Holley, J.P. Guilford, A note on the G-index of agreement. Educ. Psychol. Measur. 24(4), 749–753 (1964)

    Article  Google Scholar 

  • O.R. Holsti, Content Analysis for the Social Sciences and Humanities (Addison Wesley, London, 1969)

    Google Scholar 

  • B.L. Hopkins, J.A. Hermann, Evaluating interobserver reliability of interval data. J. Appl. Behav. Anal. 10(1), 121–126 (1977)

    Article  Google Scholar 

  • A.E. House, B.J. House, M.B. Campbell, Measures of interobserver agreement: calculation formulas and distribution effects. J. Behav. Assess. 3(1), 37–57 (1981)

    Article  Google Scholar 

  • L.J. Hubert, Kappa revisited. Psychol. Bull. 84(2), 289–297 (1977)

    Article  Google Scholar 

  • C.L. Janes, Extension of the random error coefficient of agreement to N × N tables. Br. J. Psych. 134(6), 617–619 (1979)

    Article  Google Scholar 

  • S. Janson, J. Vegelius, On the generalization of the G-index and the phi coefficient to nominal scales. Multivar. Behav. Res. 14(2), 255–269 (1979)

    Article  Google Scholar 

  • S. Janson, J. Vegelius, The J-index as a measure of nominal scale response agreement. Appl. Psychol. Measure. 6(1), 111–121 (1982)

    Article  Google Scholar 

  • R.N. Kent, S.L. Foster, Direct observational procedures: methodological issues in naturalistic settings, in Handbook of Behavioral Assessment, ed. by A.R. Ciminero, K.S. Calhoun, H.E. Adams (Wiley, New York, 1977), pp. 279–328

    Google Scholar 

  • H.C. Kraemer, Extensions of the kappa coefficient. Biometrics 36(2), 207–216 (1980)

    Article  MATH  Google Scholar 

  • L.L. Kupper, K.B. Hafner, On assessing interrater agreement for multiple attribute responses. Biometrics 45(3), 957–967 (1989)

    Article  MATH  Google Scholar 

  • J.R. Landis, G.G. Koch, A review of statistical methods in the analysis of data arising from observer reliability studies. Part 2. Stat. Neerl. 29(2), 151–161 (1975)

    Article  MATH  Google Scholar 

  • J.R. Landis, G.G. Koch, The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)

    Article  MATH  Google Scholar 

  • J. Loevinger, The technique of homogeneous test compared with some aspects of ‘scale analysis’ and ‘factor analysis’. Psychol. Bull. 45(6), 507–530 (1948)

    Article  Google Scholar 

  • M. Lombard, J. Snyder-Duch, C.C. Bracken, Content analysis in mass communication: assessment and reporting of intercoder reliability. Human Commun. Res. 28(4), 587–604 (2002)

    Article  Google Scholar 

  • A.E. Maxwell, Comparing the classifications of subjects by two independent judges. Br. J. Psych. 116(535), 651–655 (1970)

    Article  Google Scholar 

  • A.E. Maxwell, Coefficients of agreement between observers and their interpretation. Br. J. Psych. 130(1), 79–83 (1977)

    Article  Google Scholar 

  • A.E. Maxwell, A.E.G. Pilliner, Deriving coefficients of reliability and agreement for ratings. Br. J. Math. Stat. Psychol. 21(1), 105–116 (1968)

    Article  Google Scholar 

  • R.J. Mokken, A Theory and Procedure of Scale Analysis: With Applications in Political Research (Mouton, The Hague, 1971)

    Google Scholar 

  • A.C. Montgomery, K.S. Crittenden, Improving coding reliability for open-ended questions. Public Opin. Q. 41(2), 235–243 (1977)

    Article  Google Scholar 

  • W.D. Perreault, L.E. Leigh, Reliability of nominal data based on qualitative judgments. J. Mark. Res. 26(2), 135–148 (1989)

    Article  Google Scholar 

  • R. Popping, Traces of agreement. On the dot-product as a coefficient of agreement. Qual. Quant. 17(1), 1–18 (1983)

    Article  Google Scholar 

  • R. Popping, Overeenstemmingsmaten voor nominale data [Measures of agreement for nominal data]. Unpublished PhD, University of Groningen, Groningen (1983b)

    Google Scholar 

  • R. Popping, Traces of agreement: extensions of the D2-index. Qual. Quant. 19(4), 383–388 (1985)

    Article  Google Scholar 

  • R. Popping, On agreement indices for nominal data, in Sociometric Research, vol. I, ed. by W.E. Saris, I.N. Gallhofer (McMillan, London, 1988), pp. 90–105

    Chapter  Google Scholar 

  • R. Popping, In search for one set of categories. Qual. Quant. 25(1), 147–155 (1992)

    Article  Google Scholar 

  • R. Popping, Ag09. A computer program for interrater agreement for judgments. Social Sci. Comput. Rev. 28(3), 391–396 (2010)

    Article  Google Scholar 

  • D.W. Rae, M. Taylor, The Analysis of Political Cleavages (Yale University Press, New Haven, 1970), pp. 115–145

    Google Scholar 

  • E. Rogot, I.D. Goldberg, A proposed index for measuring agreement in test-retest studies. J. Chronic Diseases 19(9), 991–1006 (1966)

    Article  Google Scholar 

  • D.C. Ross, Testing patterned hypothesis in multi-way contingency tables using weighted kappa and weighted chi square. Educ. Psychol. Measur. 37(2), 291–307 (1977)

    Article  Google Scholar 

  • H.J.A. Schouten, Measuring pairwise agreement among many observers. Biometr. J. 22(6), 497–504 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  • H.J.A. Schouten, Measuring pairwise agreement among many observers. II. Some improvements and additions. Biometr. J. 24(5), 431–435 (1982a)

    Article  MathSciNet  MATH  Google Scholar 

  • H.J.A. Schouten, Measuring pairwise interobserver agreement when all subjects are judged by the same observers. Stat. Neerl. 36(2), 45–61 (1982b)

    Article  MATH  Google Scholar 

  • H.J.A. Schouten, Nominal scale agreement among observers. Psychometrika 51(3), 453–466 (1986)

    Article  MathSciNet  Google Scholar 

  • P.E. Shrout, R.L. Spitzer, J.L. Fleiss, Quantification of agreement in psychiatric diagnosis revisited. Arch. Gen. Psychiatry 44(2), 172–177 (1987)

    Article  Google Scholar 

  • J. Spanjer, B. Krol, R. Popping, J.W. Groothoff, S. Brouwer, Disability assessment interview: the role of concrete and detailed information on functioning besides medical history taking. J. Rehabil. Med. 41(4), 267–272 (2009)

    Article  Google Scholar 

  • A. Stuart, A test of homogeneity of marginal distributions in a two-way classification. Biometrika 42(3/4), 412–416 (1955)

    Article  MathSciNet  MATH  Google Scholar 

  • J.S. Uebersax, A generalized kappa coefficient. Educ. Psychol. Measu. 42(1), 181–183 (1982)

    Google Scholar 

  • J.S. Uebersax, Diversity of decision-making models and the measurement of interrater agreement. Psychol. Bull. 101(1), 140–146 (1987)

    Article  Google Scholar 

  • M.J. Warrens, Cohen’s kappa can always be increased and decreased by combining categories. Stat. Method. 7(6), 673–677 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • G.W. Williams, Comparing the joint agreement of several raters with another rater. Biometrics 32(3), 619–627 (1976)

    Article  MATH  Google Scholar 

  • J.L. Woodward, R. Franzen, A study on coding reliability. Public Opin. Q. 12(2), 253–257 (1948)

    Article  Google Scholar 

  • R. Zwick, Another look at interrater agreement. Psychol. Bull. 103(3), 374–378 (1988)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roel Popping .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Popping, R. (2019). Indices. In: Introduction to Interrater Agreement for Nominal Data. Springer, Cham. https://doi.org/10.1007/978-3-030-11671-2_4

Download citation

Publish with us

Policies and ethics