Skip to main content

Part of the book series: Statistics for Industry and Technology ((SIT))

Abstract

Assessment of agreement between two or more methods of measurement is of considerable importance in many areas. In particular, in medicine, new methods or devices that are cheaper, easier to use, or less invasive, are routinely developed. Agreement between a new method and a traditional reference or gold standard must be evaluated before the new one is put into practice. Various methodologies have been proposed for this purpose in recent years. We review the literature focussing on the assessment of agreement between two methods, and on the selection of the best when several methods are compared with a reference. A real data set is analyzed to illustrate the various approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altman, D. G. and Bland, J. M. (1983). Measurement in medicine: The analysis of method comparison studies, The Statistician, 32, 307–317.

    Article  Google Scholar 

  2. Anderson, S. and Hauck, W. W. (1990). Consideration of individual bioequivalence, Journal of Pharmacokinetics and Biopharmaceutics, 18, 259–274.

    Article  Google Scholar 

  3. Atkinson, G. and Nevill, A. (1997). Comment on the use of concordance correlation to assess the agreement between two variables, Biometrics, 53, 775–777.

    Google Scholar 

  4. Banerjee, M., Capozzoli, M., McSweeney, L., and Sinha, D. (1999). Beyond Kappa: A review of interrater agreement measures, The Canadian Journal of Statistics, 27, 3–23.

    Article  MATH  MathSciNet  Google Scholar 

  5. Barnhart, H. X., Haber, M., and Song, J. L. (2002). Overall concordance correlation coefficient for evaluating agreement among multiple observers, Biometrics, 58, 1020–1027.

    Article  MathSciNet  Google Scholar 

  6. Barnhart, H. X. and Williamson, J. M. (2001). Modeling concordance correlation via GEE to evaluate reproducibiltiy, Biometrics, 57, 931–940.

    Article  MathSciNet  Google Scholar 

  7. Bartko, J. J. (1994). Measures of agreement: A single procedure, Statistics in Medicine, 13, 737–745.

    Article  Google Scholar 

  8. Berger, R. L. and Hsu, J. C. (1996). Bioequivalence trials, intersection-union tests and equivalence confidence sets, Statistical Science, 11, 283–319.

    Article  MATH  MathSciNet  Google Scholar 

  9. Bland, J. M. and Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement, Lancet, i, 307–310.

    Google Scholar 

  10. Bland, J. M. and Altman, D. G. (1990). A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement, Computers in Biology and Medicine, 20, 337–340.

    Article  Google Scholar 

  11. Bland, J. M. and Altman, D. G. (1995a). Comparing two methods of clinical measurement: A personal history, International Journal of Epidemiology, 24, S7–S14.

    Google Scholar 

  12. Bland, J. M. and Altman, D. G. (1995b). Comparing methods of measurement: Why plotting difference against standard method is misleading, Lancet, 346, 1085–1087.

    Article  Google Scholar 

  13. Bland, J. M. and Altman, D. G. (1999). Measuring agreement in method comparison studies, Statistical Methods in Medical Research, 8, 135–160.

    Article  Google Scholar 

  14. Cameron, J. M. (1982). Calibration, In Encyclopedia of Statistical Sciences, 1, John Wiley & Sons, New York. pp. 346–351.

    Google Scholar 

  15. Casella, G. and Berger, R. (2002) Statistical Inference, 2nd edition, Duxbury Press, Pacific Grove, CA.

    Google Scholar 

  16. Chinchilli, V. M., Martel, J. K., Kumanyika, S., and Lloyd, T. (1996). A weighted concordance correlation coefficient for repeated measurement designs, Biometrics, 52, 341–353.

    Article  MATH  Google Scholar 

  17. Choudhary, P. K. (2002). Assessment of Agreement and Selection of the Best Instrument in Method Comparison Studies, Ph.D. Dissertation, The Ohio State University, Columbus, OH.

    Google Scholar 

  18. Choudhary, P. K. and Nagaraja, H. N. (2004a). Tests for assessment of agreement using probability criteria, Submitted for publication.

    Google Scholar 

  19. Choudhary, P. K. and Nagaraja, H. N. (2004b). Assessment of agreement using intersection-union principle, Biometrical Journal (to appear).

    Google Scholar 

  20. Choudhary, P. K. and Nagaraja, H. N. (2004c). A two-stage procedure for selection and assessment of agreement of the best with a gold standard, Sequential Analysis (to appear).

    Google Scholar 

  21. Choudhary, P. K. and Nagaraja, H. N. (2004d). Selecting the instrument closest to a gold standard, Journal of Statistical Planning and Inference (to appear).

    Google Scholar 

  22. David, H. A. and Nagaraja, H. N. (2003). Order Statistics, Third edition, John Wiley & Sons, New York.

    MATH  Google Scholar 

  23. Dunn, G. (1989). Design and Analysis of Reliability Studies: The Statistical Evaluation of Measurement Errors, Oxford University Press, New York.

    MATH  Google Scholar 

  24. Dunn, G. (1992). Design and analysis of reliability studies, Statistical Methods in Medical Research, 1, 123–157.

    Article  Google Scholar 

  25. Dunn, G. and Roberts, C. (1999). Modelling method comparison data, Statistical Methods in Medical Research, 8, 161–179.

    Article  Google Scholar 

  26. Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions, John Wiley & Sons, New York.

    MATH  Google Scholar 

  27. Fleiss, J. L. (1986). The Design and Analysis of Clinical Experiments, John Wiley & Sons, New York.

    MATH  Google Scholar 

  28. Fuller, W. A. (1987). Measurement Error Models, John Wiley & Sons, New York.

    MATH  Google Scholar 

  29. Guttman, I. (1988). Statistical tolerance regions, Encyclopedia of Statistical Sciences, 9, pp. 272–287, John Wiley & Sons, New York.

    Google Scholar 

  30. Grubbs, F. E. (1982). Grubbs’ estimators, In Encyclopedia of Statistical Sciences, 2, pp. 542–549, John Wiley & Sons, New York.

    Google Scholar 

  31. Gupta, S. S. and Panchapakesan, S. (1979). Multiple Decision Procedures — Theory and Methodology of Selecting and Ranking Populations, John Wiley, New York. Republished by SIAM, Philadelphia, 2002.

    MATH  Google Scholar 

  32. Hamilton, D. C. and Lesperance, M. L. (1995). A comparison of methods for univariate and multivariate acceptance sampling by variables, Technometrics, 37, 329–339.

    Article  MATH  Google Scholar 

  33. Harris, I. R., Burch, B. D. and St. Laurent, R. T. (2001). A blended estimator for measure of agreement with a gold standard, Journal of Agricultural, Biological, and Environmental Statistics, 6, 326–339.

    Article  Google Scholar 

  34. Hawkins, D. M. (2002). Diagnostics for conformity of paired quantitative measurements, Statistics in Medicine, 21, 1913–1935.

    Article  Google Scholar 

  35. Hsu, J. C. (1996). Multiple Comparisons: Theory and Methods, Chapman & Hall/CRC, Boca Raton, FL.

    MATH  Google Scholar 

  36. Hutson, A. D., Wilson, D. C., and Geiser, E. A. (1998). Measuring relative agreement: Echocardiographer versus computer, Journal of Agricultural, Biological, and Environmental Statistics, 3, 163–174.

    Article  MathSciNet  Google Scholar 

  37. Kelly, G. E. (1985). Use of structural equations model in assessing the reliability of a new measurement technique, Applied Statistics, 34, 258–263.

    Article  Google Scholar 

  38. King, T. S. and Chinchilli, V. M. (2001). A generalized concordance correlation coefficient for continuous and categorical data, Statistics in Medicine, 20, 2131–2147.

    Article  Google Scholar 

  39. Kraemer, H. C., Periyakoil, V. S., and Noda, A. (2002). Kappa coefficients in medical research, Statistics in Medicine, 21, 2109–2129.

    Article  Google Scholar 

  40. Krummenauer, F. (1999). Intraindividual scale comparison in clinical diagnostic methods: A review of elementary methods, Biometrical Journal, 41, 917–929.

    Article  MATH  Google Scholar 

  41. Lee, J., Koh, D., and Ong, C. N. (1989). Statistical evaluation of agreement between two methods for measuring a quantitative variable, Computers in Biology and Medicine, 19, 61–70.

    Article  Google Scholar 

  42. Lewis, P. A., Jones, P. W., Polak, J. W., and Tillotson, H. T. (1991). The problem of conversion in method comparison studies, Applied Statistics, 40, 105–112.

    Article  MATH  Google Scholar 

  43. Liao, J. and Lewis, J. (2000). An agreement curve, Presented at the Joint Statistical Meetings, Indianapolis, IN.

    Google Scholar 

  44. Lin, L. I. (1989). A concordance correlation coefficient to evaluate reproducibility, Biometrics, 45, 255–268. Corrections: 2000, 56, 324–325.

    Article  MATH  Google Scholar 

  45. Lin, L. I. (1992). Assay validation using the concordance correlation coefficient, Biometrics, 48, 599–604.

    Article  Google Scholar 

  46. Lin, L. I. (2000). Total deviation index for measuring individual agreement with applications in laboratory performance and bioequivalence, Statistics in Medicine, 19, 255–270.

    Article  Google Scholar 

  47. Lin, L. I. (2003). Measuring agreement. In Encyclopedia of Biopharmaceutical Statistics, 2nd edition, pp. 561–567, Marcel Dekker, New York.

    Google Scholar 

  48. Lin, L. I. and Chinchilli, V. (1997). Rejoinder to the letter to the editor from Atkinson and Nevill, Biometrics, 53, 777–778.

    Google Scholar 

  49. Lin, L. I., Hedayat, A. S., Sinha, B. and Yang, M. (2002). Statistical methods in assessing agreement: Models, issues, and tools, Journal of the American Statistical Association, 97, 257–270.

    Article  MATH  MathSciNet  Google Scholar 

  50. Lin, L. I. and Torbeck, L. D. (1998). Coefficient of accuracy and concordance correlation coefficient: New statistics for method comparison, PDA Journal of Pharmaceutical Science and Technology, 52, 55–59.

    MATH  Google Scholar 

  51. Lin, S. C., Whipple, D. M., and Ho, C. S. (1998). Evaluation of statistical equivalence using limits of agreement and associated sample size calculation, Communications in Statistics—Theory and Methods, 27, 1419–1432.

    Article  MATH  Google Scholar 

  52. Linnet, K. (1993). Evaluation of regression procedures for method comparison studies, Clinical Chemistry, 39, 424–432.

    Google Scholar 

  53. Liu, J.-P. and Chow, S.-C. (1997). A two one-sided tests procedure for assessment of individual bioequivalence, Journal of Biopharmaceutical Statistics, 7, 49–61.

    Article  MATH  Google Scholar 

  54. McGraw, K. O. and Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients, Psychological Methods, 1, 30–46.

    Article  Google Scholar 

  55. Mukhopadhyay, N. and Chou, W.-S. (1984). On selecting the best component of a multivariate normal population, Sequential Analysis, 3, 1–22.

    Article  MATH  MathSciNet  Google Scholar 

  56. Müller, R. and Büttner, P. (1994). A critical discussion of intraclass correlation coefficients, Statistics in Medicine, 13, 2465–2476.

    Article  Google Scholar 

  57. Nickerson, C. A. (1997). Comment on “A concordance correlation coefficient to evaluate reproducibility”, Biometrics, 53, 1503–1507.

    Article  MATH  Google Scholar 

  58. Nix, A. B. J. and Dunston, F. D. J. (1991). Maximum likelihood techniques applied to method comparison studies, Statistics in Medicine, 10, 981–988.

    Article  Google Scholar 

  59. Quan, H. and Shih, W. J. (1996). Assessing reproducibility by the within-subject coefficient of variation with random effects models, Biometrics, 52, 1195–1203. Correspondence, Biometrics, 56, 301–302.

    Article  MATH  Google Scholar 

  60. Robieson, W. Z. (1999). On the weighted kappa and concordance correlation coefficient, Ph.D. Dissertation, University of Illinois at Chicago, IL.

    Google Scholar 

  61. Shoukri, M. M. (1999). Measurement of Agreement, In Encyclopedia of Biostatistics, 1, pp. 103–117. John Wiley & Sons, New York.

    Google Scholar 

  62. Shoukri, M. M. (2004). Measures of Interobserver Agreement, Chapman & Hall/CRC, Boca Raton, FL.

    Google Scholar 

  63. St. Laurent, R. T. (1998). Evaluating agreement with a gold standard in method comparison studies, Biometrics, 54, 537–545.

    Article  MathSciNet  Google Scholar 

  64. Vonesh, E. F., Chinchilli, V. P., and Pu, K. W. (1996). Goodness-of-fit in generalized nonlinear mixed-effects models, Bometrics, 52, 572–587.

    Article  MATH  Google Scholar 

  65. Wang. W. and Hwang, J. T. G. (2001). A nearly unbiased test for individual bioequivalence problems using probability criteria, Journal of Statistical Planning and Inference, 99, 41–58.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Birkhäuser Boston

About this chapter

Cite this chapter

Choudhary, P.K., Nagaraja, H.N. (2005). Measuring Agreement in Method Comparison Studies — A Review. In: Balakrishnan, N., Nagaraja, H.N., Kannan, N. (eds) Advances in Ranking and Selection, Multiple Comparisons, and Reliability. Statistics for Industry and Technology. Birkhäuser Boston. https://doi.org/10.1007/0-8176-4422-9_13

Download citation

Publish with us

Policies and ethics