Generalized Agreement Statistics over Fixed Group of Experts

Shah, Mohak

doi:10.1007/978-3-642-23808-6_13

Generalized Agreement Statistics over Fixed Group of Experts

Mohak Shah²³

Conference paper

5534 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6913))

Abstract

Generalizations of chance corrected statistics to measure inter-expert agreement on class label assignments to the data instances have traditionally relied on the marginalization argument over a variable group of experts. Further, this argument has also resulted in agreement measures to evaluate the class predictions by an isolated classifier against the (multiple) labels assigned by the group of experts. We show that these measures are not necessarily suitable for application in the more typical fixed experts’ group scenario. We also propose novel, more meaningful, less variable generalizations for quantifying both the inter-expert agreement over the fixed group and assessing a classifier’s output against it in a multi-expert multi-class scenario by taking into account expert-specific biases and correlations.

Download to read the full chapter text

Chapter PDF

References

Asuncion, A., Newman, D.J.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Google Scholar
Berry, K.J., Mielke Jr, P.W.: A generalization of cohen’s kappa agreement measure to interval measurement and multiple raters. Educational and Psychological Measurements 48, 921–933 (1988)
Article Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurements 20, 37–46 (1960)
Article Google Scholar
Eckstein, M.P., Wickens, T.D., Aharonov, G., Ruan, G., Morioka, C.A., Whiting, J.S.: Quantifying the limitation of the use of consensus expert commitees in roc studies. In: Proceedings SPIE: Medical Imaging 1998, vol. 3340, pp. 128–134 (1998)
Google Scholar
Efron, B., Tibshirani, R.J.: An introduction to the bootstrap. Chapman and Hall, New York (1993)
Book MATH Google Scholar
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5), 378–382 (1971)
Article Google Scholar
Hubert, L.: Kappa revisited. Psychological Bulletin 84(2), 289–297 (1977)
Article Google Scholar
Japkowicz, N., Shah, M.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, New York (2011)
Book MATH Google Scholar
Kraemer, H.C.: Ramifications of a population model for κ as a coefficient of reliability. Psychometrika 44, 461–472 (1979)
Article MATH Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken (2004) ISBN : 0471210781
Book MATH Google Scholar
Light, R.L.: Measures of response agreement for qualitative data: some generalizations and alternatives. Psychological Bulletin 76, 365–377 (1971)
Article Google Scholar
Miller, D.P., O’Shaughnessy, K.F., Wood, S.A., Castellino, R.A.: Gold standard and expert panels: a pulmonary nodule case study with challenges and solutions. In: Proceedings SPIE: Medical Imaging 2004: Image Perception, Observer Performance and Technology Assessment, vol. 5372, pp. 173–184 (2004)
Google Scholar
Rao, C.R.: Linear Statistical Inference and its Applications, 2nd edn. Wiley, New York (2001)
Google Scholar
Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. Journal of Machine Learning Research 11, 1297–1322 (2010)
MathSciNet Google Scholar
Salerno, S.M., Alguire, P.C., Waxman, S.W.: Competency in interpretation of 12-lead electrocardiograms: a summary and appraisal of published evidence. Annals of Internal Medicine 138, 751–760 (2003)
Article Google Scholar
Schouten, H.J.A.: Measuring pairwise interobserver agreement when all subjects are judged by the same observers. Statistica Neerlandica 36, 45–61 (1982)
Article MATH Google Scholar
Scott, W.A.: Reliability of content analysis: The case of nominal scale coding. Public Opinion Q 19, 321–325 (1955)
Article Google Scholar
Smith, R., Copas, A.J., Prince, M., George, B., Walker, A.S., Sadiq, S.T.: Poor sensitivity and consistency of microscopy in the diagnosis of low grade non-gonococcal urethrisis. Sexually Transmitted Infections 79, 487–490 (2003)
Article Google Scholar
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: EMNLP 2008: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263. ACL (2008)
Google Scholar
Soeken, K.L., Prescott, P.A.: Issues in the use of kappa to estimate reliability. Medical Care 24, 733–741 (1986)
Article Google Scholar
Vanbelle, S., Albert, A.: Agreement between an isolated rater and a group of raters. Statistica Neerlandica 63(1), 82–100 (2009)
Article MathSciNet Google Scholar
Warfield, S.K., Zou, K.H., Wells, W.M.: Simultaneous truth and performance level estimation (staple): an algorithm for the validation of image segmentation. IEEE Trans. Med. Imaging 23(7), 903–921 (2004)
Article Google Scholar
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. Advances in Neural Information Processing Systems 22, 2035–2043 (2009)
Google Scholar
Williams, G.W.: Comparing the joint agreement of several raters with another rater. Biometrics 32, 619–627 (1976)
Article MATH Google Scholar
Witten, I.H., Frank, E.: Weka 3: Data Mining Software in Java (2005), http://www.cs.waikato.ac.nz/ml/weka/
Yan, Y., Rosales, R., Fung, G., Schmidt, M., Hermosillo, G., Bogoni, L., Moy, L., Dy, J.: Modeling annotator expertise: Learning when everybody knows a bit of something. In: Proc. International Conference on Artificial Intelligence and Statistics (AISTATS). JMLR, vol. 9, pp. 932–939 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Accenture Technology Labs, 161 N. Clark St., Chicago, IL, 60601, USA
Mohak Shah

Authors

Mohak Shah
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece
Dimitrios Gunopulos
Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland
Thomas Hofmann
Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Donato Malerba
Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece
Michalis Vazirgiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shah, M. (2011). Generalized Agreement Statistics over Fixed Group of Experts. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6913. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23808-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-23808-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23807-9
Online ISBN: 978-3-642-23808-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics