Abstract
Assessment of agreement between two or more methods of measurement is of considerable importance in many areas. In particular, in medicine, new methods or devices that are cheaper, easier to use, or less invasive, are routinely developed. Agreement between a new method and a traditional reference or gold standard must be evaluated before the new one is put into practice. Various methodologies have been proposed for this purpose in recent years. We review the literature focussing on the assessment of agreement between two methods, and on the selection of the best when several methods are compared with a reference. A real data set is analyzed to illustrate the various approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altman, D. G. and Bland, J. M. (1983). Measurement in medicine: The analysis of method comparison studies, The Statistician, 32, 307–317.
Anderson, S. and Hauck, W. W. (1990). Consideration of individual bioequivalence, Journal of Pharmacokinetics and Biopharmaceutics, 18, 259–274.
Atkinson, G. and Nevill, A. (1997). Comment on the use of concordance correlation to assess the agreement between two variables, Biometrics, 53, 775–777.
Banerjee, M., Capozzoli, M., McSweeney, L., and Sinha, D. (1999). Beyond Kappa: A review of interrater agreement measures, The Canadian Journal of Statistics, 27, 3–23.
Barnhart, H. X., Haber, M., and Song, J. L. (2002). Overall concordance correlation coefficient for evaluating agreement among multiple observers, Biometrics, 58, 1020–1027.
Barnhart, H. X. and Williamson, J. M. (2001). Modeling concordance correlation via GEE to evaluate reproducibiltiy, Biometrics, 57, 931–940.
Bartko, J. J. (1994). Measures of agreement: A single procedure, Statistics in Medicine, 13, 737–745.
Berger, R. L. and Hsu, J. C. (1996). Bioequivalence trials, intersection-union tests and equivalence confidence sets, Statistical Science, 11, 283–319.
Bland, J. M. and Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement, Lancet, i, 307–310.
Bland, J. M. and Altman, D. G. (1990). A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement, Computers in Biology and Medicine, 20, 337–340.
Bland, J. M. and Altman, D. G. (1995a). Comparing two methods of clinical measurement: A personal history, International Journal of Epidemiology, 24, S7–S14.
Bland, J. M. and Altman, D. G. (1995b). Comparing methods of measurement: Why plotting difference against standard method is misleading, Lancet, 346, 1085–1087.
Bland, J. M. and Altman, D. G. (1999). Measuring agreement in method comparison studies, Statistical Methods in Medical Research, 8, 135–160.
Cameron, J. M. (1982). Calibration, In Encyclopedia of Statistical Sciences, 1, John Wiley & Sons, New York. pp. 346–351.
Casella, G. and Berger, R. (2002) Statistical Inference, 2nd edition, Duxbury Press, Pacific Grove, CA.
Chinchilli, V. M., Martel, J. K., Kumanyika, S., and Lloyd, T. (1996). A weighted concordance correlation coefficient for repeated measurement designs, Biometrics, 52, 341–353.
Choudhary, P. K. (2002). Assessment of Agreement and Selection of the Best Instrument in Method Comparison Studies, Ph.D. Dissertation, The Ohio State University, Columbus, OH.
Choudhary, P. K. and Nagaraja, H. N. (2004a). Tests for assessment of agreement using probability criteria, Submitted for publication.
Choudhary, P. K. and Nagaraja, H. N. (2004b). Assessment of agreement using intersection-union principle, Biometrical Journal (to appear).
Choudhary, P. K. and Nagaraja, H. N. (2004c). A two-stage procedure for selection and assessment of agreement of the best with a gold standard, Sequential Analysis (to appear).
Choudhary, P. K. and Nagaraja, H. N. (2004d). Selecting the instrument closest to a gold standard, Journal of Statistical Planning and Inference (to appear).
David, H. A. and Nagaraja, H. N. (2003). Order Statistics, Third edition, John Wiley & Sons, New York.
Dunn, G. (1989). Design and Analysis of Reliability Studies: The Statistical Evaluation of Measurement Errors, Oxford University Press, New York.
Dunn, G. (1992). Design and analysis of reliability studies, Statistical Methods in Medical Research, 1, 123–157.
Dunn, G. and Roberts, C. (1999). Modelling method comparison data, Statistical Methods in Medical Research, 8, 161–179.
Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions, John Wiley & Sons, New York.
Fleiss, J. L. (1986). The Design and Analysis of Clinical Experiments, John Wiley & Sons, New York.
Fuller, W. A. (1987). Measurement Error Models, John Wiley & Sons, New York.
Guttman, I. (1988). Statistical tolerance regions, Encyclopedia of Statistical Sciences, 9, pp. 272–287, John Wiley & Sons, New York.
Grubbs, F. E. (1982). Grubbs’ estimators, In Encyclopedia of Statistical Sciences, 2, pp. 542–549, John Wiley & Sons, New York.
Gupta, S. S. and Panchapakesan, S. (1979). Multiple Decision Procedures — Theory and Methodology of Selecting and Ranking Populations, John Wiley, New York. Republished by SIAM, Philadelphia, 2002.
Hamilton, D. C. and Lesperance, M. L. (1995). A comparison of methods for univariate and multivariate acceptance sampling by variables, Technometrics, 37, 329–339.
Harris, I. R., Burch, B. D. and St. Laurent, R. T. (2001). A blended estimator for measure of agreement with a gold standard, Journal of Agricultural, Biological, and Environmental Statistics, 6, 326–339.
Hawkins, D. M. (2002). Diagnostics for conformity of paired quantitative measurements, Statistics in Medicine, 21, 1913–1935.
Hsu, J. C. (1996). Multiple Comparisons: Theory and Methods, Chapman & Hall/CRC, Boca Raton, FL.
Hutson, A. D., Wilson, D. C., and Geiser, E. A. (1998). Measuring relative agreement: Echocardiographer versus computer, Journal of Agricultural, Biological, and Environmental Statistics, 3, 163–174.
Kelly, G. E. (1985). Use of structural equations model in assessing the reliability of a new measurement technique, Applied Statistics, 34, 258–263.
King, T. S. and Chinchilli, V. M. (2001). A generalized concordance correlation coefficient for continuous and categorical data, Statistics in Medicine, 20, 2131–2147.
Kraemer, H. C., Periyakoil, V. S., and Noda, A. (2002). Kappa coefficients in medical research, Statistics in Medicine, 21, 2109–2129.
Krummenauer, F. (1999). Intraindividual scale comparison in clinical diagnostic methods: A review of elementary methods, Biometrical Journal, 41, 917–929.
Lee, J., Koh, D., and Ong, C. N. (1989). Statistical evaluation of agreement between two methods for measuring a quantitative variable, Computers in Biology and Medicine, 19, 61–70.
Lewis, P. A., Jones, P. W., Polak, J. W., and Tillotson, H. T. (1991). The problem of conversion in method comparison studies, Applied Statistics, 40, 105–112.
Liao, J. and Lewis, J. (2000). An agreement curve, Presented at the Joint Statistical Meetings, Indianapolis, IN.
Lin, L. I. (1989). A concordance correlation coefficient to evaluate reproducibility, Biometrics, 45, 255–268. Corrections: 2000, 56, 324–325.
Lin, L. I. (1992). Assay validation using the concordance correlation coefficient, Biometrics, 48, 599–604.
Lin, L. I. (2000). Total deviation index for measuring individual agreement with applications in laboratory performance and bioequivalence, Statistics in Medicine, 19, 255–270.
Lin, L. I. (2003). Measuring agreement. In Encyclopedia of Biopharmaceutical Statistics, 2nd edition, pp. 561–567, Marcel Dekker, New York.
Lin, L. I. and Chinchilli, V. (1997). Rejoinder to the letter to the editor from Atkinson and Nevill, Biometrics, 53, 777–778.
Lin, L. I., Hedayat, A. S., Sinha, B. and Yang, M. (2002). Statistical methods in assessing agreement: Models, issues, and tools, Journal of the American Statistical Association, 97, 257–270.
Lin, L. I. and Torbeck, L. D. (1998). Coefficient of accuracy and concordance correlation coefficient: New statistics for method comparison, PDA Journal of Pharmaceutical Science and Technology, 52, 55–59.
Lin, S. C., Whipple, D. M., and Ho, C. S. (1998). Evaluation of statistical equivalence using limits of agreement and associated sample size calculation, Communications in Statistics—Theory and Methods, 27, 1419–1432.
Linnet, K. (1993). Evaluation of regression procedures for method comparison studies, Clinical Chemistry, 39, 424–432.
Liu, J.-P. and Chow, S.-C. (1997). A two one-sided tests procedure for assessment of individual bioequivalence, Journal of Biopharmaceutical Statistics, 7, 49–61.
McGraw, K. O. and Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients, Psychological Methods, 1, 30–46.
Mukhopadhyay, N. and Chou, W.-S. (1984). On selecting the best component of a multivariate normal population, Sequential Analysis, 3, 1–22.
Müller, R. and Büttner, P. (1994). A critical discussion of intraclass correlation coefficients, Statistics in Medicine, 13, 2465–2476.
Nickerson, C. A. (1997). Comment on “A concordance correlation coefficient to evaluate reproducibility”, Biometrics, 53, 1503–1507.
Nix, A. B. J. and Dunston, F. D. J. (1991). Maximum likelihood techniques applied to method comparison studies, Statistics in Medicine, 10, 981–988.
Quan, H. and Shih, W. J. (1996). Assessing reproducibility by the within-subject coefficient of variation with random effects models, Biometrics, 52, 1195–1203. Correspondence, Biometrics, 56, 301–302.
Robieson, W. Z. (1999). On the weighted kappa and concordance correlation coefficient, Ph.D. Dissertation, University of Illinois at Chicago, IL.
Shoukri, M. M. (1999). Measurement of Agreement, In Encyclopedia of Biostatistics, 1, pp. 103–117. John Wiley & Sons, New York.
Shoukri, M. M. (2004). Measures of Interobserver Agreement, Chapman & Hall/CRC, Boca Raton, FL.
St. Laurent, R. T. (1998). Evaluating agreement with a gold standard in method comparison studies, Biometrics, 54, 537–545.
Vonesh, E. F., Chinchilli, V. P., and Pu, K. W. (1996). Goodness-of-fit in generalized nonlinear mixed-effects models, Bometrics, 52, 572–587.
Wang. W. and Hwang, J. T. G. (2001). A nearly unbiased test for individual bioequivalence problems using probability criteria, Journal of Statistical Planning and Inference, 99, 41–58.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Birkhäuser Boston
About this chapter
Cite this chapter
Choudhary, P.K., Nagaraja, H.N. (2005). Measuring Agreement in Method Comparison Studies — A Review. In: Balakrishnan, N., Nagaraja, H.N., Kannan, N. (eds) Advances in Ranking and Selection, Multiple Comparisons, and Reliability. Statistics for Industry and Technology. Birkhäuser Boston. https://doi.org/10.1007/0-8176-4422-9_13
Download citation
DOI: https://doi.org/10.1007/0-8176-4422-9_13
Publisher Name: Birkhäuser Boston
Print ISBN: 978-0-8176-3232-8
Online ISBN: 978-0-8176-4422-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)