Indices

Popping, Roel

doi:10.1007/978-3-030-11671-2_4

Roel Popping²

457 Accesses

Abstract

All formulas for the agreement indices based on the research situations distinguished in Chap. 3 are presented. The indices belong to a family of indices. The formulas are not ones for all imaginable research situations. Only the ones are presented that are regularly found in existing empirical research. The chapter also lists all known indices, and it is indicated why these do or do not fulfil the requirements mentioned in Chap. 3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

A. Agresti, Modelling patterns of agreement and disagreement. Stat. Methods Med. Res. 1(2), 201–218 (1992)
Article Google Scholar
M.R. Anderberg, Cluster Analysis for Applications (Academic Press, New York, 1973)
MATH Google Scholar
P. Armitage, L.M. Blendis, H.C. Smyllie, The measurement of observer disagreement in the recording of signs. J. R. Stat. Soc. (A) 129(1), 98–109 (1966)
MathSciNet Google Scholar
W. Barlow, M.Y. Lai, S.P. Azen, A comparison of methods for calculating a stratified kappa. Stat. Med. 10(9), 1465–1472 (1991)
Article Google Scholar
E.M. Bennett, R.L. Blomquist, A.C. Goldstein, Response stability in limited response questioning. Public Opin. Q. 18(2), 218–223 (1954)
Article Google Scholar
Y.M.M. Bishop, S.E. Fienberg, P.W. Holland, Discrete Multivariate Analysis. Theory and Practice (The MIT Press, Cambridge, 1975)
MATH Google Scholar
R.L. Brennan, R.J. Light, Measuring agreement when two observers classify people into categories not defined in advance. Br. J. Math. Stat. Psychol. 27(2), 154–163 (1974)
Article Google Scholar
T. Byrt, J. Bishop, B. Carlin, Bias, prevalence and kappa. J. Clin. Epidemiol. 46(5), 423–429 (1993)
Article Google Scholar
D.S. Cartwright, A rapid non parametric estimate of multi judge reliability. Psychometrika 21(1), 17–29 (1956)
Article MATH Google Scholar
D.V. Cicchetti, A new measure of agreement between rank ordered variables, in Proceedings of the 80th Annual Convention, vol. 7. American Statistical Association (1972), pp. 17–18
Google Scholar
D.V. Cicchetti, Testing the normal approximation and minimal sample requirements of weighted kappa when the number of categories is large. Appl. Psychol. Meas. 5(1), 101–104 (1981)
Article Google Scholar
D.V. Cicchetti, J.L. Fleiss, Comparing the null distributions of weighted kappa and the C ordinal statistic. Appl. Psychol. Meas. 1(2), 195–201 (1977)
Article Google Scholar
D.V. Cicchetti, C. Lee, A.F. Fontana, B.N. Dows, A computer program for assessing specific category rater agreement for qualitative data. Educ. Psychol. Measur. 38(3), 805–813 (1978)
Article Google Scholar
P.G. Clement, A formula for computing inter-observer agreement. Psychol. Rep. 39(1), 257–258 (1976)
Article Google Scholar
J. Cohen, A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Article Google Scholar
A.J. Conger, Integration and generalization of kappas for multiple raters. Psychol. Bull. 88(2), 322–328 (1980)
Article Google Scholar
R.T. Craig, Generalization of Scott’s index of intercoder agreement. Public Opinion Quarterly 45(2), 260–264 (1981)
Article Google Scholar
M. Davies, J.L. Fleiss, Measuring agreement for multinomial data. Biometrics 38(4), 1047–1051 (1982)
Article MATH Google Scholar
L.R. Dice, Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Article Google Scholar
B. Efron, Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Biometrika 68(3), 589–599 (1981)
Article MathSciNet MATH Google Scholar
R.C. Elston, S.R. Schroeder, J. Rojahn, Measures of observer agreement when binomial data are collected in free operant situations. J. Behav. Assess. 4(4), 299–310 (1982)
Article Google Scholar
A.R. Feinstein, D.V. Cicchetti, High agreement but low kappa, I: the problems of two paradoxes. J. Clin. Epidemiol. 43(6), 543–549 (1990)
Article Google Scholar
N.A. Flanders, Interaction Analysis: Theory, Research and Applications (Addison-Wesley, Reading, 1967), pp. 161–166
Google Scholar
J.L. Fleiss, Estimating the accuracy of dichotomous judgements. Psychometrika 30(4), 469–479 (1965)
Article MATH Google Scholar
J.L. Fleiss, Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)
Article Google Scholar
J.L. Fleiss, D.V. Cicchetti, Inference about weighted kappa in the non-null case. Appl. Psychol. Meas. 2(1), 113–117 (1978)
Article Google Scholar
J.L. Fleiss, J. Cohen, B.S. Everitt, Large sample standard errors of kappa and weighted kappa. Psychol. Bull. 72(5), 323–327 (1969)
Article Google Scholar
J.L. Fleiss, J. Cuzick, The reliability of dichotomous judgments: unequal number of judges per subject. Appl. Psychol. Meas. 3(4), 537–542 (1979)
Article Google Scholar
J.L. Fleiss, B.S. Everitt, Comparing the marginal totals of square contingency tables. Br. J. Math. Stat. Psychol. 24(1), 117–123 (1971)
Article MATH Google Scholar
J.L. Fleiss, J.C.M. Nee, J.R. Landis, Large sample variance of kappa in the case of different sets of raters. Psychol. Bull. 86(5), 974–977 (1979)
Article Google Scholar
J.L. Fleiss, R.L. Spitzer, J. Endicott, J. Cohen, ‘Quantification of agreement in multiple psychiatric diagnosis. Arch. Gen. Psychiatry 26(2), 168–171 (1972)
Article Google Scholar
J.L. Fleiss, Statistical Methods for Rates and Proportions (Wiley, New York, 1981)
Google Scholar
J. Galtung, Theory and Methods of Social Research (Allen & Unwin, London, 1967)
Google Scholar
M. Gamer, J. Lemon, I. Fellows, P. Sing, P, in Various coefficients of interrater reliability and agreement. (Version 0.83) [software] (2010). Available from http://CRAN.R-project.org/package=irr
C.S. Garrett, Modification of the Scott coefficient as an observer agreement estimate for marginal-form observation scale data. J Exp Educ 43(1), 21–26 (1975)
Article Google Scholar
L.A. Goodman, W.H. Kruskal, Measures of association for cross classifications. J. Am. Stat. Assoc. 49(268), 732–764 (1954)
MATH Google Scholar
W.M. Grove, N.C. Andreasen, P. McDonald-Scott, M.B. Keller, R.W. Shapiro, Reliability studies of psychiatric diagnosis. Theory and practice. Arch. Gen. Psychiatry 38(4), 408–413 (1981)
Article Google Scholar
K.L. Gwet, Computing inter-rater reliability and its variance in the presence of high agreement. Br. J. Math. Stat. Psychol. 61(1), 29–48 (2008a)
Article MathSciNet Google Scholar
K.L. Gwet, Variance estimation of nominal-scale inter-rater reliability with random selection of raters. Psychometrika 73(3), 407–430 (2008b)
Article MathSciNet MATH Google Scholar
F.C. Harris, B.B. Lahey, A method for combining occurrence and nonoccurrence interobserver agreement scores. J. Appl. Behav. Anal. 11(4), 523–527 (1978)
Article Google Scholar
R.P. Hawkins, V.A. Dotson, Reliability scores that delude: an Alice in Wonderland trip through the misleading characteristics of interobserver agreement scores in interval recording, in Behavior Analysis. Areas of Research and Application, ed. by E. Ramp, G. Semb (Prentice Hall, Englewood Cliffs, 1975), pp. 359–376
Google Scholar
A.F. Hayes, K. Krippendorff, Answering the call for a standard reliability measure for coding data. Commun. Methods Measures 1(1), 77–89 (2007)
Article Google Scholar
A. Hervé, L.J. Williams, Jackknife, in Encyclopedia of Research Design, ed. by N. Salkind (Sage, Thousand Oaks, 2010)
Google Scholar
J.W. Holley, J.P. Guilford, A note on the G-index of agreement. Educ. Psychol. Measur. 24(4), 749–753 (1964)
Article Google Scholar
O.R. Holsti, Content Analysis for the Social Sciences and Humanities (Addison Wesley, London, 1969)
Google Scholar
B.L. Hopkins, J.A. Hermann, Evaluating interobserver reliability of interval data. J. Appl. Behav. Anal. 10(1), 121–126 (1977)
Article Google Scholar
A.E. House, B.J. House, M.B. Campbell, Measures of interobserver agreement: calculation formulas and distribution effects. J. Behav. Assess. 3(1), 37–57 (1981)
Article Google Scholar
L.J. Hubert, Kappa revisited. Psychol. Bull. 84(2), 289–297 (1977)
Article Google Scholar
C.L. Janes, Extension of the random error coefficient of agreement to N × N tables. Br. J. Psych. 134(6), 617–619 (1979)
Article Google Scholar
S. Janson, J. Vegelius, On the generalization of the G-index and the phi coefficient to nominal scales. Multivar. Behav. Res. 14(2), 255–269 (1979)
Article Google Scholar
S. Janson, J. Vegelius, The J-index as a measure of nominal scale response agreement. Appl. Psychol. Measure. 6(1), 111–121 (1982)
Article Google Scholar
R.N. Kent, S.L. Foster, Direct observational procedures: methodological issues in naturalistic settings, in Handbook of Behavioral Assessment, ed. by A.R. Ciminero, K.S. Calhoun, H.E. Adams (Wiley, New York, 1977), pp. 279–328
Google Scholar
H.C. Kraemer, Extensions of the kappa coefficient. Biometrics 36(2), 207–216 (1980)
Article MATH Google Scholar
L.L. Kupper, K.B. Hafner, On assessing interrater agreement for multiple attribute responses. Biometrics 45(3), 957–967 (1989)
Article MATH Google Scholar
J.R. Landis, G.G. Koch, A review of statistical methods in the analysis of data arising from observer reliability studies. Part 2. Stat. Neerl. 29(2), 151–161 (1975)
Article MATH Google Scholar
J.R. Landis, G.G. Koch, The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)
Article MATH Google Scholar
J. Loevinger, The technique of homogeneous test compared with some aspects of ‘scale analysis’ and ‘factor analysis’. Psychol. Bull. 45(6), 507–530 (1948)
Article Google Scholar
M. Lombard, J. Snyder-Duch, C.C. Bracken, Content analysis in mass communication: assessment and reporting of intercoder reliability. Human Commun. Res. 28(4), 587–604 (2002)
Article Google Scholar
A.E. Maxwell, Comparing the classifications of subjects by two independent judges. Br. J. Psych. 116(535), 651–655 (1970)
Article Google Scholar
A.E. Maxwell, Coefficients of agreement between observers and their interpretation. Br. J. Psych. 130(1), 79–83 (1977)
Article Google Scholar
A.E. Maxwell, A.E.G. Pilliner, Deriving coefficients of reliability and agreement for ratings. Br. J. Math. Stat. Psychol. 21(1), 105–116 (1968)
Article Google Scholar
R.J. Mokken, A Theory and Procedure of Scale Analysis: With Applications in Political Research (Mouton, The Hague, 1971)
Google Scholar
A.C. Montgomery, K.S. Crittenden, Improving coding reliability for open-ended questions. Public Opin. Q. 41(2), 235–243 (1977)
Article Google Scholar
W.D. Perreault, L.E. Leigh, Reliability of nominal data based on qualitative judgments. J. Mark. Res. 26(2), 135–148 (1989)
Article Google Scholar
R. Popping, Traces of agreement. On the dot-product as a coefficient of agreement. Qual. Quant. 17(1), 1–18 (1983)
Article Google Scholar
R. Popping, Overeenstemmingsmaten voor nominale data [Measures of agreement for nominal data]. Unpublished PhD, University of Groningen, Groningen (1983b)
Google Scholar
R. Popping, Traces of agreement: extensions of the D2-index. Qual. Quant. 19(4), 383–388 (1985)
Article Google Scholar
R. Popping, On agreement indices for nominal data, in Sociometric Research, vol. I, ed. by W.E. Saris, I.N. Gallhofer (McMillan, London, 1988), pp. 90–105
Chapter Google Scholar
R. Popping, In search for one set of categories. Qual. Quant. 25(1), 147–155 (1992)
Article Google Scholar
R. Popping, Ag09. A computer program for interrater agreement for judgments. Social Sci. Comput. Rev. 28(3), 391–396 (2010)
Article Google Scholar
D.W. Rae, M. Taylor, The Analysis of Political Cleavages (Yale University Press, New Haven, 1970), pp. 115–145
Google Scholar
E. Rogot, I.D. Goldberg, A proposed index for measuring agreement in test-retest studies. J. Chronic Diseases 19(9), 991–1006 (1966)
Article Google Scholar
D.C. Ross, Testing patterned hypothesis in multi-way contingency tables using weighted kappa and weighted chi square. Educ. Psychol. Measur. 37(2), 291–307 (1977)
Article Google Scholar
H.J.A. Schouten, Measuring pairwise agreement among many observers. Biometr. J. 22(6), 497–504 (1980)
Article MathSciNet MATH Google Scholar
H.J.A. Schouten, Measuring pairwise agreement among many observers. II. Some improvements and additions. Biometr. J. 24(5), 431–435 (1982a)
Article MathSciNet MATH Google Scholar
H.J.A. Schouten, Measuring pairwise interobserver agreement when all subjects are judged by the same observers. Stat. Neerl. 36(2), 45–61 (1982b)
Article MATH Google Scholar
H.J.A. Schouten, Nominal scale agreement among observers. Psychometrika 51(3), 453–466 (1986)
Article MathSciNet Google Scholar
P.E. Shrout, R.L. Spitzer, J.L. Fleiss, Quantification of agreement in psychiatric diagnosis revisited. Arch. Gen. Psychiatry 44(2), 172–177 (1987)
Article Google Scholar
J. Spanjer, B. Krol, R. Popping, J.W. Groothoff, S. Brouwer, Disability assessment interview: the role of concrete and detailed information on functioning besides medical history taking. J. Rehabil. Med. 41(4), 267–272 (2009)
Article Google Scholar
A. Stuart, A test of homogeneity of marginal distributions in a two-way classification. Biometrika 42(3/4), 412–416 (1955)
Article MathSciNet MATH Google Scholar
J.S. Uebersax, A generalized kappa coefficient. Educ. Psychol. Measu. 42(1), 181–183 (1982)
Google Scholar
J.S. Uebersax, Diversity of decision-making models and the measurement of interrater agreement. Psychol. Bull. 101(1), 140–146 (1987)
Article Google Scholar
M.J. Warrens, Cohen’s kappa can always be increased and decreased by combining categories. Stat. Method. 7(6), 673–677 (2010)
Article MathSciNet MATH Google Scholar
G.W. Williams, Comparing the joint agreement of several raters with another rater. Biometrics 32(3), 619–627 (1976)
Article MATH Google Scholar
J.L. Woodward, R. Franzen, A study on coding reliability. Public Opin. Q. 12(2), 253–257 (1948)
Article Google Scholar
R. Zwick, Another look at interrater agreement. Psychol. Bull. 103(3), 374–378 (1988)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Sociology, University of Groningen, Groningen, The Netherlands
Roel Popping

Authors

Roel Popping
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roel Popping .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Popping, R. (2019). Indices. In: Introduction to Interrater Agreement for Nominal Data. Springer, Cham. https://doi.org/10.1007/978-3-030-11671-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-11671-2_4
Published: 23 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11670-5
Online ISBN: 978-3-030-11671-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics