Skip to main content

Modeling and Analysis of Method Comparison Data with Skewness and Heavy Tails

  • Conference paper
  • First Online:
Ordered Data Analysis, Modeling and Health Research Methods

Abstract

The analysis of method comparison data is mainly concerned with evaluating agreement between methods of measuring a continuous variable. The methodology commonly assumes normally distributed data, which are usually modeled using a standard linear mixed model that assumes normality for both random effects and errors. In practice, however, the data often exhibit skewness and have tails heavier than those of a normal distribution, possibly due to outlying observations. When such data are analyzed using the standard mixed model, the non-normality may become apparent from model diagnostics. This article develops a methodology for agreement evaluation by modeling data using a recent robust mixed model that assumes a skew-t distribution for random effects and an independent t-distribution for errors. As the standard model is a special case of the robust model, the new methodology offers a unified framework for analyzing data with skewness and heavy tails as well as normally distributed data. The methodology is presented for both unreplicated and replicated data. A real example is used for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arellano-Valle, R.B., H. Bolfarine, and V.H. Lachos. 2005. Skew-normal linear mixed models. Journal of Data Science 3: 415–438.

    MATH  Google Scholar 

  2. Azzalini, A. 1985. A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12: 171–178.

    MathSciNet  MATH  Google Scholar 

  3. Azzalini, A., and A. Capitanio. 2003. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t distribution. Journal of the Royal Statistical Society, Series B 65: 367–389.

    Article  MathSciNet  MATH  Google Scholar 

  4. Barnhart, H.X., and J.M. Williamson. 2001. Modeling concordance correlation via GEE to evaluate reproducibility. Biometrics 57: 931–940.

    Article  MathSciNet  MATH  Google Scholar 

  5. Barnhart, H.X., M.J. Haber, and L.I. Lin. 2007. An overview on assessing agreement with continuous measurement. Journal of Biopharmaceutical Statistics 17: 529–569.

    Article  MathSciNet  Google Scholar 

  6. Barnhart, H.X., M.J. Haber, and J. Song. 2002. Overall concordance correlation coefficient for evaluating agreement among multiple observers. Biometrics 58: 1020–1027.

    Article  MathSciNet  MATH  Google Scholar 

  7. Barnhart, H.X., J. Song, and M.J. Haber. 2005. Assessing intra, inter and total agreement with replicated readings. Statistics in Medicine 24: 1371–1384.

    Article  MathSciNet  Google Scholar 

  8. Bland, J.M., and D.G. Altman. 1999. Measuring agreement in method comparison studies. Statistical Methods in Medical Research 8: 135–160.

    Article  Google Scholar 

  9. Carrasco, J.L., and L. Jover. 2003. Estimating the generalized concordance correlation coefficient through variance components. Biometrics 59: 849–858.

    Article  MathSciNet  MATH  Google Scholar 

  10. Carrasco, J.L., L. Jover, T.S. King, and V.M. Chinchilli. 2007. Comparison of concordance correlation coefficient estimating approaches with skewed data. Journal of Biopharmaceutical Statistics 17: 673–684.

    Article  MathSciNet  Google Scholar 

  11. Carrasco, J.L., T.S. King, and V.M. Chinchilli. 2009. The concordance correlation coefficient for repeated measures estimated by variance components. Journal of Biopharmaceutical Statistics 19: 90–105.

    Article  MathSciNet  Google Scholar 

  12. Carstensen, B. 2010. Comparing Clinical Measurement Methods: A Practical Guide. New York: Wiley.

    Book  Google Scholar 

  13. Carstensen, B. Simpson, J. Gurrin, L.C. 2008. Statistical models for assessing agreement in method comparison studies with replicate measurements. The International Journal of Biostatistics 4. doi:10.2202/1557-4679.1107

  14. Choudhary, P.K. 2008. A tolerance interval approach for assessment of agreement in method comparison studies with repeated measurements. Journal of Statistical Planning and Inference 138: 1102–1115.

    Article  MathSciNet  MATH  Google Scholar 

  15. Choudhary, P.K. 2010. A unified approach for nonparametric evaluation of agreement in method comparison studies. The International Journal of Biostatistics 6. doi:10.2202/1557-4679.1235

  16. Choudhary, P.K., and H.N. Nagaraja. 2007. Tests for assessment of agreement using probability criteria. Journal of Statistical Planning and Inference 137: 279–290.

    Article  MathSciNet  MATH  Google Scholar 

  17. Choudhary, P.K., and K. Yin. 2010. Bayesian and frequentist methodologies for analyzing method comparison studies with multiple methods. Statistics in Biopharmaceutical Research 2: 122–132.

    Article  Google Scholar 

  18. Choudhary, P.K., D. Sengupta, and P. Cassey. 2014. A general skew-t mixed model that allows different degrees of freedom for random effects and error distributions. Journal of Statistical Planning and Inference 147: 235–247.

    Article  MathSciNet  MATH  Google Scholar 

  19. Genton, M.G. 2004. Skew-Elliptical distributions and their applications - A journey beyond normality. Boca Raton: Chapman & Hall/CRC Press.

    Book  MATH  Google Scholar 

  20. Gilbert, P. and Varadhan, R. 2012. numDeriv: Accurate Numerical Derivatives. R package version 2012.9-1

    Google Scholar 

  21. Ho, H.J., and T.I. Lin. 2010. Robust linear mixed models using the skew t distribution with application to schizophrenia data. Biometrical Journal 52: 449–469.

    Article  MathSciNet  MATH  Google Scholar 

  22. Hothorn, T., F. Bretz, and P. Westfall. 2008. Simultaneous inference in general parametric models. Biometrical Journal 50: 346–363.

    Article  MathSciNet  MATH  Google Scholar 

  23. Igic, B., M.E. Hauber, J.A. Galbraith, T. Grim, D.C. Dearborn, P.L.R. Brennan, C. Moskat, P.K. Choudhary, and P. Cassey. 2010. Comparison of micrometer—and scanning electron microscope-based measurements of avian eggshell thickness. Journal of Field Ornithology 81: 402–410.

    Article  Google Scholar 

  24. Jarek, S. 2012. mvnormtest: Normality test for multivariate variables. R package version 0.1-9

    Google Scholar 

  25. King, T.S., and V.M. Chinchilli. 2001. A generalized concordance correlation coefficient for continuous and categorical data. Statistics in Medicine 20: 2131–2147.

    Article  Google Scholar 

  26. King, T.S., and V.M. Chinchilli. 2001. Robust estimators of the concordance correlation coefficient. Journal of Biopharmaceutical Statistics 11: 83–105.

    Article  Google Scholar 

  27. King, T.S., V.M. Chinchilli, and J.L. Carrasco. 2007. A repeated measures concordance correlation coefficient. Statistics in Medicine 26: 3095–3113.

    Article  MathSciNet  Google Scholar 

  28. King, T.S., V.M. Chinchilli, K.-L. Wang, and J.L. Carrasco. 2007. A class of repeated measures concordance correlation coefficients. Journal of Biopharmaceutical Statistics 17: 653–672.

    Article  MathSciNet  Google Scholar 

  29. Lachos, V.H., P. Ghosh, and R.B. Arellano-Valle. 2010. Likelihood based inference for skew-normal independent linear mixed models. Statistica Sinica 20: 303–322.

    MathSciNet  MATH  Google Scholar 

  30. Lehmann, E.L. 1998. Elements of Large-Sample Theory. New York: Springer.

    Google Scholar 

  31. Lin, L.I. 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45, 255–268. Corrections: 2000, 56, 324–325

    Google Scholar 

  32. Lin, L.I. 2000. Total deviation index for measuring individual agreement with applications in laboratory performance and bioequivalence. Statistics in Medicine 19: 255–270.

    Article  Google Scholar 

  33. Lin, L.I., A.S. Hedayat, and W. Wu. 2007. A unified approach for assessing agreement for continuous and categorical data. Journal of Biopharmaceutical Statistics 17: 629–652.

    Article  MathSciNet  Google Scholar 

  34. Lin, L.I., A.S. Hedayat, and W. Wu. 2011. Statistical Tools for Measuring Agreement. New York: Springer.

    MATH  Google Scholar 

  35. McLachlan, G.J., and T. Krishnan. 2007. The EM algorithm and extensions, 2nd ed. New York: Wiley.

    MATH  Google Scholar 

  36. Meng, X.-L., and D.B. Rubin. 1993. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80: 267–278.

    Article  MathSciNet  MATH  Google Scholar 

  37. Pinheiro, J.C., and D.M. Bates. 2000. Mixed-Effects models in S and S-PLUS. New York: Springer.

    Google Scholar 

  38. Pinheiro, J.C., C. Liu, and Y.N. Wu. 2001. Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate \(t\) distribution. Journal of Computational and Graphical Statistics 10: 249–276.

    Article  MathSciNet  Google Scholar 

  39. R Core Team. 2014. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.

    Google Scholar 

  40. Roy, A. 2009. An application of linear mixed effects model to assess the agreement between two methods with replicated observations. Journal of Biopharmaceutical Statistics 19: 150–173.

    Article  MathSciNet  Google Scholar 

  41. Smyth, G., Y. Hu, P. Dunn, B. Phipson, and Y. Chen. 2014. statmod: Statistical modeling. R package version 1.4.20.

    Google Scholar 

  42. Verbeke, G., and E. Lesaffre. 1996. A linear mixed-effects model with heterogeneity in the random-effects population. Journal of the American Statistical Association 91: 217–221.

    Article  MATH  Google Scholar 

  43. Wang, C.M., and H.K. Iyer. 2008. Fiducial approach for assessing agreement between two instruments. Metrologia 45: 415–421.

    Article  Google Scholar 

  44. Zhang, D., and M. Davidian. 2001. Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics 57: 795–802.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors thank Golo Maurer, Rebecca Boulton and Leanne Reaney for assistance in collection of the crab claws data. They are also thankful to a reviewer for comments that greatly improved this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pankaj K. Choudhary .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 A.1 Definitions

Let \(\mathbf {Y}\) be a \(J \times 1\) random vector with \(\varvec{\mu }\) as a \(J \times 1\) location parameter vector and \(\varvec{\varSigma }\) as a \(J \times J\) scale matrix. Define

$$\begin{aligned} \mathbf {Y}^* = \varvec{\varSigma }^{-1/2}(\mathbf {Y}-\varvec{\mu }). \end{aligned}$$

Also let \(\phi _J( \cdot | \varvec{\mu }, \varvec{\varSigma })\) be the density function of a \(\mathcal {N}_{J}(\varvec{\mu }, \varvec{\varSigma })\) distribution, and \(\tau (\cdot ,\nu )\) be the distribution function of a univariate t-distribution with \(\nu \) degrees of freedom. The J-dimensional skew-normal, t and skew-t distributions are defined as follows.

Definition 1

\(\mathbf {Y} \sim \mathcal {SN}_J (\varvec{\mu }, \varvec{\varSigma }, \varvec{\lambda })\) if its density function is

$$\begin{aligned} f(\mathbf {y} | \varvec{\mu }, \varvec{\varSigma }, \varvec{\lambda })= 2 \phi _J (\mathbf {y}|\varvec{\mu },\varvec{\varSigma }) \Phi (\varvec{\lambda }^{\prime } \mathbf {y}^*), ~~ \mathbf {y} \in \mathbb {R}^J. \end{aligned}$$

Definition 2

\(\mathbf {Y} \sim t_J (\varvec{\mu }, \varvec{\varSigma },\nu )\) if its density function is

$$\begin{aligned} f(\mathbf {y} | \varvec{\mu }, \varvec{\varSigma }, \nu )= (\nu \pi )^{-J/2} \, \frac{\text {gam}( (\nu +J)/2)}{\text {gam}(\nu /2)} | \varvec{\varSigma } | ^{-1/2} \left( 1 + \mathbf {y}^{*\prime } \mathbf {y}^*/\nu \right) ^{-(\nu +J)/2}, ~~ \mathbf {y} \in \mathbb {R}^J, \end{aligned}$$

with \(\text {gam} (\cdot )\) as the gamma function.

Definition 3

\( \mathbf {Y} \sim \mathcal {ST}_J (\varvec{\mu }, \varvec{\varSigma }, \varvec{\lambda }, \nu )\) if its density function is

$$\begin{aligned} f(\mathbf {y} | \varvec{\mu }, \varvec{\varSigma }, \varvec{\lambda }, \nu )=2\, f_t (\mathbf {y} | \varvec{\mu }, \varvec{\varSigma }, \nu ) \, \tau \! \left( \varvec{\lambda }^{\prime } \mathbf {y}^* \{ (\nu +J)/(\nu + \mathbf {y}^{*\prime } \mathbf {y}^*) \}^{1/2} \mid \nu +J \right) \!\!, ~~ \mathbf {y} \in \mathbb {R}^J, \end{aligned}$$

where \(f_t (\mathbf {y} | \varvec{\mu }, \varvec{\varSigma }, \nu )\) is the density function of a \(t_J(\varvec{\mu },\varvec{\varSigma },\nu )\) distribution.

1.2 A.2 Hierarchical Representation for a GST Mixed Model

Let \(\mathbf {Y}\) be an M-vector obtained by dropping the subscript i in \(\mathbf {Y}_i\) defined by (1). From (3), the GST mixed model for \(\mathbf {Y}\) can be written as

$$\begin{aligned} \mathbf {Y} = \mathbf {X} \varvec{\beta } + \mathbf {Z} \mathbf {b} + \mathbf {e}, ~~ \mathbf {b} \sim \mathcal {ST}_q (\mathbf {0}, \varvec{\varPsi }, \varvec{\lambda }, \nu _b), ~~ \mathbf {e} \sim t_n (\mathbf {0}, \varvec{\varSigma }, \nu _e), \end{aligned}$$
(A.1)

where \(\mathbf {b}\) and \(\mathbf {e}\) are mutually independent. For a hierarchical representation of this model, define for \(v > 0\),

$$\begin{aligned}&\varvec{\Pi }_v = ( \mathbf {Z} \varvec{\varPsi } \mathbf {Z}^\prime + v \varvec{\varSigma }), \nonumber \end{aligned}$$
$$\begin{aligned}&\varvec{\lambda }_v = \frac{ \varvec{\Pi }_v ^{-1/2} \mathbf {Z} \varvec{\varPsi }^{1/2} \varvec{\lambda }}{ \big ( 1 + \varvec{\lambda }^\prime \varvec{\varPsi }^{-1/2} (\varvec{\varPsi }^{-1} + \mathbf {Z}^\prime \varvec{\varSigma }^{-1} \mathbf {Z} /v )^{-1} \varvec{\varPsi }^{-1/2} \varvec{\lambda } \big )^{1/2}}, \end{aligned}$$
(A.2)

and let \(\mathcal {G} (\alpha , \beta )\) denote a gamma distribution with parameters \(\alpha , \beta > 0\), and density

$$\begin{aligned} f (y | \alpha , \beta ) = \frac{\beta ^\alpha }{\text {gam}(\alpha )} y^{\alpha - 1} \exp (- \beta y), ~~ y >0. \end{aligned}$$
(A.3)

Now from [18], the model (A.1) can be represented as

$$\begin{aligned} \mathbf {Y} | U, V \sim \mathcal {SN}_n (\mathbf {X} \varvec{\beta }, \varvec{\Pi }_V/U, \varvec{\lambda }_V), \, U \sim \mathcal {G} (\nu _b/2, \nu _b/2), \, U/V \sim \mathcal {G} (\nu _e/2, \nu _e/2). \end{aligned}$$
(A.4)

1.3 A.3 Linear Combination of Skew-Normals

Proposition 1

Let \(\mathbf {Y} \sim \mathcal {SN}_q (\varvec{\beta }, \varvec{\varPsi }, \varvec{\lambda })\) and consider the quantities defined in (4). Let \(\mathbf {a} \in \mathbb {R}^q\) with at least one non-zero element. Then

$$ \mathbf {a}^\prime \mathbf {Y} \sim \mathcal {SN}_1 \big (\mathbf {a}^\prime \varvec{\beta }, \mathbf {a}^\prime \varvec{\varPsi } \mathbf {a}, \mathbf {a}^\prime \varvec{\varPsi }^{1/2} \varvec{\delta }/( \mathbf {a}^\prime \varvec{\varGamma } \mathbf {a})^{1/2} \big ). $$

Proof

The proof relies on a stochastic representation of a skew-normal variate. Let \(\mathbf {Y}^* \sim \mathcal {SN}_q (\mathbf {0}, \varvec{\varPsi }, \varvec{\lambda })\). Then, from [1],

$$\begin{aligned} \mathbf {Y}^* \overset{d}{=} \varvec{\varPsi }^{1/2} \varvec{\delta } |G_1^*| + \varvec{\varPsi }^{1/2} (\mathbf {I}_q - \varvec{\delta } \varvec{\delta }^\prime ) \mathbf {G}_2^*, \end{aligned}$$
(A.5)

where \(G_1^* \sim \mathcal {N}_1 (0, 1)\), \(\mathbf {G}_2^* \sim \mathcal {N}_q (\mathbf {0}, \mathbf {I}_q)\) independently of \(G_1^*\), and the notation “\(\overset{d}{=}\)” means “equal in distribution.” Using (A.5), we can write

$$\begin{aligned} \mathbf {a}^\prime \mathbf {Y} \overset{d}{=} \mathbf {a}^\prime \varvec{\beta } + \mathbf {a}^\prime \varvec{\varPsi }^{1/2} \varvec{\delta } |G_1^*| + (\mathbf {a}^\prime \varvec{\varGamma } \mathbf {a})^{1/2} G^*, \end{aligned}$$

where \(G^* \sim \mathcal {N}_1 (0, 1)\) independently of \(G_1^*\). Define

$$ \lambda ^* = \mathbf {a}^\prime \varvec{\varPsi }^{1/2} \varvec{\delta }/( \mathbf {a}^\prime \varvec{\varGamma } \mathbf {a})^{1/2}, ~~ \delta ^* = \lambda ^*/(1 + \lambda ^{*2})^{1/2}. $$

From an application of (4), we have \( (\mathbf {a}^\prime \varvec{\varPsi }^{1/2} \varvec{\delta })^2 + \mathbf {a}^\prime \varvec{\varGamma } \mathbf {a} = \mathbf {a}^\prime \varvec{\varPsi } \mathbf {a}\), implying

$$\begin{aligned} \mathbf {a}^\prime \varvec{\varPsi }^{1/2} \varvec{\delta } = (\mathbf {a}^\prime \varvec{\varPsi } \mathbf {a})^{1/2} \delta ^*, ~~ (\mathbf {a}^\prime \varvec{\varGamma } \mathbf {a})^{1/2} = (\mathbf {a}^\prime \varvec{\varPsi } \mathbf {a})^{1/2} (1-\delta ^{*2})^{1/2}. \end{aligned}$$

This allows us to write

$$\begin{aligned} \mathbf {a}^\prime \mathbf {Y} \overset{d}{=} \mathbf {a}^\prime \varvec{\beta } + (\mathbf {a}^\prime \varvec{\varPsi } \mathbf {a})^{1/2} \delta ^* |G_1^*| + (\mathbf {a}^\prime \varvec{\varPsi } \mathbf {a})^{1/2} (1-\delta ^{*2})^{1/2} G^*. \end{aligned}$$

Now the result follows from the representation (A.5) for the univariate case.

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Sengupta, D., Choudhary, P.K., Cassey, P. (2015). Modeling and Analysis of Method Comparison Data with Skewness and Heavy Tails. In: Choudhary, P., Nagaraja, C., Ng, H. (eds) Ordered Data Analysis, Modeling and Health Research Methods. Springer Proceedings in Mathematics & Statistics, vol 149. Springer, Cham. https://doi.org/10.1007/978-3-319-25433-3_11

Download citation

Publish with us

Policies and ethics