Modeling and Analysis of Method Comparison Data with Skewness and Heavy Tails

Sengupta, Dishari; Choudhary, Pankaj K.; Cassey, Phillip

doi:10.1007/978-3-319-25433-3_11

Dishari Sengupta⁴,
Pankaj K. Choudhary⁵ &
Phillip Cassey⁶

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 149))

877 Accesses
1 Citations
1 Altmetric

Abstract

The analysis of method comparison data is mainly concerned with evaluating agreement between methods of measuring a continuous variable. The methodology commonly assumes normally distributed data, which are usually modeled using a standard linear mixed model that assumes normality for both random effects and errors. In practice, however, the data often exhibit skewness and have tails heavier than those of a normal distribution, possibly due to outlying observations. When such data are analyzed using the standard mixed model, the non-normality may become apparent from model diagnostics. This article develops a methodology for agreement evaluation by modeling data using a recent robust mixed model that assumes a skew-t distribution for random effects and an independent t-distribution for errors. As the standard model is a special case of the robust model, the new methodology offers a unified framework for analyzing data with skewness and heavy tails as well as normally distributed data. The methodology is presented for both unreplicated and replicated data. A real example is used for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arellano-Valle, R.B., H. Bolfarine, and V.H. Lachos. 2005. Skew-normal linear mixed models. Journal of Data Science 3: 415–438.
MATH Google Scholar
Azzalini, A. 1985. A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12: 171–178.
MathSciNet MATH Google Scholar
Azzalini, A., and A. Capitanio. 2003. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t distribution. Journal of the Royal Statistical Society, Series B 65: 367–389.
Article MathSciNet MATH Google Scholar
Barnhart, H.X., and J.M. Williamson. 2001. Modeling concordance correlation via GEE to evaluate reproducibility. Biometrics 57: 931–940.
Article MathSciNet MATH Google Scholar
Barnhart, H.X., M.J. Haber, and L.I. Lin. 2007. An overview on assessing agreement with continuous measurement. Journal of Biopharmaceutical Statistics 17: 529–569.
Article MathSciNet Google Scholar
Barnhart, H.X., M.J. Haber, and J. Song. 2002. Overall concordance correlation coefficient for evaluating agreement among multiple observers. Biometrics 58: 1020–1027.
Article MathSciNet MATH Google Scholar
Barnhart, H.X., J. Song, and M.J. Haber. 2005. Assessing intra, inter and total agreement with replicated readings. Statistics in Medicine 24: 1371–1384.
Article MathSciNet Google Scholar
Bland, J.M., and D.G. Altman. 1999. Measuring agreement in method comparison studies. Statistical Methods in Medical Research 8: 135–160.
Article Google Scholar
Carrasco, J.L., and L. Jover. 2003. Estimating the generalized concordance correlation coefficient through variance components. Biometrics 59: 849–858.
Article MathSciNet MATH Google Scholar
Carrasco, J.L., L. Jover, T.S. King, and V.M. Chinchilli. 2007. Comparison of concordance correlation coefficient estimating approaches with skewed data. Journal of Biopharmaceutical Statistics 17: 673–684.
Article MathSciNet Google Scholar
Carrasco, J.L., T.S. King, and V.M. Chinchilli. 2009. The concordance correlation coefficient for repeated measures estimated by variance components. Journal of Biopharmaceutical Statistics 19: 90–105.
Article MathSciNet Google Scholar
Carstensen, B. 2010. Comparing Clinical Measurement Methods: A Practical Guide. New York: Wiley.
Book Google Scholar
Carstensen, B. Simpson, J. Gurrin, L.C. 2008. Statistical models for assessing agreement in method comparison studies with replicate measurements. The International Journal of Biostatistics 4. doi:10.2202/1557-4679.1107
Choudhary, P.K. 2008. A tolerance interval approach for assessment of agreement in method comparison studies with repeated measurements. Journal of Statistical Planning and Inference 138: 1102–1115.
Article MathSciNet MATH Google Scholar
Choudhary, P.K. 2010. A unified approach for nonparametric evaluation of agreement in method comparison studies. The International Journal of Biostatistics 6. doi:10.2202/1557-4679.1235
Choudhary, P.K., and H.N. Nagaraja. 2007. Tests for assessment of agreement using probability criteria. Journal of Statistical Planning and Inference 137: 279–290.
Article MathSciNet MATH Google Scholar
Choudhary, P.K., and K. Yin. 2010. Bayesian and frequentist methodologies for analyzing method comparison studies with multiple methods. Statistics in Biopharmaceutical Research 2: 122–132.
Article Google Scholar
Choudhary, P.K., D. Sengupta, and P. Cassey. 2014. A general skew-t mixed model that allows different degrees of freedom for random effects and error distributions. Journal of Statistical Planning and Inference 147: 235–247.
Article MathSciNet MATH Google Scholar
Genton, M.G. 2004. Skew-Elliptical distributions and their applications - A journey beyond normality. Boca Raton: Chapman & Hall/CRC Press.
Book MATH Google Scholar
Gilbert, P. and Varadhan, R. 2012. numDeriv: Accurate Numerical Derivatives. R package version 2012.9-1
Google Scholar
Ho, H.J., and T.I. Lin. 2010. Robust linear mixed models using the skew t distribution with application to schizophrenia data. Biometrical Journal 52: 449–469.
Article MathSciNet MATH Google Scholar
Hothorn, T., F. Bretz, and P. Westfall. 2008. Simultaneous inference in general parametric models. Biometrical Journal 50: 346–363.
Article MathSciNet MATH Google Scholar
Igic, B., M.E. Hauber, J.A. Galbraith, T. Grim, D.C. Dearborn, P.L.R. Brennan, C. Moskat, P.K. Choudhary, and P. Cassey. 2010. Comparison of micrometer—and scanning electron microscope-based measurements of avian eggshell thickness. Journal of Field Ornithology 81: 402–410.
Article Google Scholar
Jarek, S. 2012. mvnormtest: Normality test for multivariate variables. R package version 0.1-9
Google Scholar
King, T.S., and V.M. Chinchilli. 2001. A generalized concordance correlation coefficient for continuous and categorical data. Statistics in Medicine 20: 2131–2147.
Article Google Scholar
King, T.S., and V.M. Chinchilli. 2001. Robust estimators of the concordance correlation coefficient. Journal of Biopharmaceutical Statistics 11: 83–105.
Article Google Scholar
King, T.S., V.M. Chinchilli, and J.L. Carrasco. 2007. A repeated measures concordance correlation coefficient. Statistics in Medicine 26: 3095–3113.
Article MathSciNet Google Scholar
King, T.S., V.M. Chinchilli, K.-L. Wang, and J.L. Carrasco. 2007. A class of repeated measures concordance correlation coefficients. Journal of Biopharmaceutical Statistics 17: 653–672.
Article MathSciNet Google Scholar
Lachos, V.H., P. Ghosh, and R.B. Arellano-Valle. 2010. Likelihood based inference for skew-normal independent linear mixed models. Statistica Sinica 20: 303–322.
MathSciNet MATH Google Scholar
Lehmann, E.L. 1998. Elements of Large-Sample Theory. New York: Springer.
Google Scholar
Lin, L.I. 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45, 255–268. Corrections: 2000, 56, 324–325
Google Scholar
Lin, L.I. 2000. Total deviation index for measuring individual agreement with applications in laboratory performance and bioequivalence. Statistics in Medicine 19: 255–270.
Article Google Scholar
Lin, L.I., A.S. Hedayat, and W. Wu. 2007. A unified approach for assessing agreement for continuous and categorical data. Journal of Biopharmaceutical Statistics 17: 629–652.
Article MathSciNet Google Scholar
Lin, L.I., A.S. Hedayat, and W. Wu. 2011. Statistical Tools for Measuring Agreement. New York: Springer.
MATH Google Scholar
McLachlan, G.J., and T. Krishnan. 2007. The EM algorithm and extensions, 2nd ed. New York: Wiley.
MATH Google Scholar
Meng, X.-L., and D.B. Rubin. 1993. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80: 267–278.
Article MathSciNet MATH Google Scholar
Pinheiro, J.C., and D.M. Bates. 2000. Mixed-Effects models in S and S-PLUS. New York: Springer.
Google Scholar
Pinheiro, J.C., C. Liu, and Y.N. Wu. 2001. Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate $t$ distribution. Journal of Computational and Graphical Statistics 10: 249–276.
Article MathSciNet Google Scholar
R Core Team. 2014. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Google Scholar
Roy, A. 2009. An application of linear mixed effects model to assess the agreement between two methods with replicated observations. Journal of Biopharmaceutical Statistics 19: 150–173.
Article MathSciNet Google Scholar
Smyth, G., Y. Hu, P. Dunn, B. Phipson, and Y. Chen. 2014. statmod: Statistical modeling. R package version 1.4.20.
Google Scholar
Verbeke, G., and E. Lesaffre. 1996. A linear mixed-effects model with heterogeneity in the random-effects population. Journal of the American Statistical Association 91: 217–221.
Article MATH Google Scholar
Wang, C.M., and H.K. Iyer. 2008. Fiducial approach for assessing agreement between two instruments. Metrologia 45: 415–421.
Article Google Scholar
Zhang, D., and M. Davidian. 2001. Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics 57: 795–802.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors thank Golo Maurer, Rebecca Boulton and Leanne Reaney for assistance in collection of the crab claws data. They are also thankful to a reviewer for comments that greatly improved this article.

Author information

Authors and Affiliations

RainMan Consulting Pvt Ltd, Bangalore, 560075, Karnataka, India
Dishari Sengupta
Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, 75083-0688, USA
Pankaj K. Choudhary
School of Biological Sciences, University of Adelaide, North Terrace, SA, 5005, Australia
Phillip Cassey

Authors

Dishari Sengupta
View author publications
You can also search for this author in PubMed Google Scholar
Pankaj K. Choudhary
View author publications
You can also search for this author in PubMed Google Scholar
Phillip Cassey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pankaj K. Choudhary .

Editor information

Editors and Affiliations

Department of Mathematical Sciences FO 35, University of Texas at Dallas, Richardson, Texas, USA
Pankaj Choudhary
Fordham University, New York, New York, USA
Chaitra H. Nagaraja
Southern Methodist University, Dallas, Texas, USA
Hon Keung Tony Ng

Appendix

1.1 A.1 Definitions

Let $\mathbf {Y}$ be a $J \times 1$ random vector with $\varvec{\mu }$ as a $J \times 1$ location parameter vector and $\varvec{\varSigma }$ as a $J \times J$ scale matrix. Define

$$\begin{aligned} \mathbf {Y}^* = \varvec{\varSigma }^{-1/2}(\mathbf {Y}-\varvec{\mu }). \end{aligned}$$

Also let $\phi _J( \cdot | \varvec{\mu }, \varvec{\varSigma })$ be the density function of a $\mathcal {N}_{J}(\varvec{\mu }, \varvec{\varSigma })$ distribution, and $\tau (\cdot ,\nu )$ be the distribution function of a univariate t-distribution with $\nu $ degrees of freedom. The J-dimensional skew-normal, t and skew-t distributions are defined as follows.

Definition 1

$\mathbf {Y} \sim \mathcal {SN}_J (\varvec{\mu }, \varvec{\varSigma }, \varvec{\lambda })$ if its density function is

$$\begin{aligned} f(\mathbf {y} | \varvec{\mu }, \varvec{\varSigma }, \varvec{\lambda })= 2 \phi _J (\mathbf {y}|\varvec{\mu },\varvec{\varSigma }) \Phi (\varvec{\lambda }^{\prime } \mathbf {y}^*), ~~ \mathbf {y} \in \mathbb {R}^J. \end{aligned}$$

Definition 2

$\mathbf {Y} \sim t_J (\varvec{\mu }, \varvec{\varSigma },\nu )$ if its density function is

$$\begin{aligned} f(\mathbf {y} | \varvec{\mu }, \varvec{\varSigma }, \nu )= (\nu \pi )^{-J/2} \, \frac{\text {gam}( (\nu +J)/2)}{\text {gam}(\nu /2)} | \varvec{\varSigma } | ^{-1/2} \left( 1 + \mathbf {y}^{*\prime } \mathbf {y}^*/\nu \right) ^{-(\nu +J)/2}, ~~ \mathbf {y} \in \mathbb {R}^J, \end{aligned}$$

with $\text {gam} (\cdot )$ as the gamma function.

Definition 3

$ \mathbf {Y} \sim \mathcal {ST}_J (\varvec{\mu }, \varvec{\varSigma }, \varvec{\lambda }, \nu )$ if its density function is

$$\begin{aligned} f(\mathbf {y} | \varvec{\mu }, \varvec{\varSigma }, \varvec{\lambda }, \nu )=2\, f_t (\mathbf {y} | \varvec{\mu }, \varvec{\varSigma }, \nu ) \, \tau \! \left( \varvec{\lambda }^{\prime } \mathbf {y}^* \{ (\nu +J)/(\nu + \mathbf {y}^{*\prime } \mathbf {y}^*) \}^{1/2} \mid \nu +J \right) \!\!, ~~ \mathbf {y} \in \mathbb {R}^J, \end{aligned}$$

where $f_t (\mathbf {y} | \varvec{\mu }, \varvec{\varSigma }, \nu )$ is the density function of a $t_J(\varvec{\mu },\varvec{\varSigma },\nu )$ distribution.

1.2 A.2 Hierarchical Representation for a GST Mixed Model

Let $\mathbf {Y}$ be an M-vector obtained by dropping the subscript i in $\mathbf {Y}_i$ defined by (1). From (3), the GST mixed model for $\mathbf {Y}$ can be written as

$$\begin{aligned} \mathbf {Y} = \mathbf {X} \varvec{\beta } + \mathbf {Z} \mathbf {b} + \mathbf {e}, ~~ \mathbf {b} \sim \mathcal {ST}_q (\mathbf {0}, \varvec{\varPsi }, \varvec{\lambda }, \nu _b), ~~ \mathbf {e} \sim t_n (\mathbf {0}, \varvec{\varSigma }, \nu _e), \end{aligned}$$

(A.1)

where $\mathbf {b}$ and $\mathbf {e}$ are mutually independent. For a hierarchical representation of this model, define for $v > 0$,

$$\begin{aligned}&\varvec{\Pi }_v = ( \mathbf {Z} \varvec{\varPsi } \mathbf {Z}^\prime + v \varvec{\varSigma }), \nonumber \end{aligned}$$

$$\begin{aligned}&\varvec{\lambda }_v = \frac{ \varvec{\Pi }_v ^{-1/2} \mathbf {Z} \varvec{\varPsi }^{1/2} \varvec{\lambda }}{ \big ( 1 + \varvec{\lambda }^\prime \varvec{\varPsi }^{-1/2} (\varvec{\varPsi }^{-1} + \mathbf {Z}^\prime \varvec{\varSigma }^{-1} \mathbf {Z} /v )^{-1} \varvec{\varPsi }^{-1/2} \varvec{\lambda } \big )^{1/2}}, \end{aligned}$$

(A.2)

and let $\mathcal {G} (\alpha , \beta )$ denote a gamma distribution with parameters $\alpha , \beta > 0$, and density

$$\begin{aligned} f (y | \alpha , \beta ) = \frac{\beta ^\alpha }{\text {gam}(\alpha )} y^{\alpha - 1} \exp (- \beta y), ~~ y >0. \end{aligned}$$

(A.3)

Now from [18], the model (A.1) can be represented as

$$\begin{aligned} \mathbf {Y} | U, V \sim \mathcal {SN}_n (\mathbf {X} \varvec{\beta }, \varvec{\Pi }_V/U, \varvec{\lambda }_V), \, U \sim \mathcal {G} (\nu _b/2, \nu _b/2), \, U/V \sim \mathcal {G} (\nu _e/2, \nu _e/2). \end{aligned}$$

(A.4)

1.3 A.3 Linear Combination of Skew-Normals

Proposition 1

Let $\mathbf {Y} \sim \mathcal {SN}_q (\varvec{\beta }, \varvec{\varPsi }, \varvec{\lambda })$ and consider the quantities defined in (4). Let $\mathbf {a} \in \mathbb {R}^q$ with at least one non-zero element. Then

$$ \mathbf {a}^\prime \mathbf {Y} \sim \mathcal {SN}_1 \big (\mathbf {a}^\prime \varvec{\beta }, \mathbf {a}^\prime \varvec{\varPsi } \mathbf {a}, \mathbf {a}^\prime \varvec{\varPsi }^{1/2} \varvec{\delta }/( \mathbf {a}^\prime \varvec{\varGamma } \mathbf {a})^{1/2} \big ). $$

Proof

The proof relies on a stochastic representation of a skew-normal variate. Let $\mathbf {Y}^* \sim \mathcal {SN}_q (\mathbf {0}, \varvec{\varPsi }, \varvec{\lambda })$. Then, from [1],

$$\begin{aligned} \mathbf {Y}^* \overset{d}{=} \varvec{\varPsi }^{1/2} \varvec{\delta } |G_1^*| + \varvec{\varPsi }^{1/2} (\mathbf {I}_q - \varvec{\delta } \varvec{\delta }^\prime ) \mathbf {G}_2^*, \end{aligned}$$

(A.5)

where $G_1^* \sim \mathcal {N}_1 (0, 1)$, $\mathbf {G}_2^* \sim \mathcal {N}_q (\mathbf {0}, \mathbf {I}_q)$ independently of $G_1^*$, and the notation “$\overset{d}{=}$” means “equal in distribution.” Using (A.5), we can write

$$\begin{aligned} \mathbf {a}^\prime \mathbf {Y} \overset{d}{=} \mathbf {a}^\prime \varvec{\beta } + \mathbf {a}^\prime \varvec{\varPsi }^{1/2} \varvec{\delta } |G_1^*| + (\mathbf {a}^\prime \varvec{\varGamma } \mathbf {a})^{1/2} G^*, \end{aligned}$$

where $G^* \sim \mathcal {N}_1 (0, 1)$ independently of $G_1^*$. Define

$$ \lambda ^* = \mathbf {a}^\prime \varvec{\varPsi }^{1/2} \varvec{\delta }/( \mathbf {a}^\prime \varvec{\varGamma } \mathbf {a})^{1/2}, ~~ \delta ^* = \lambda ^*/(1 + \lambda ^{*2})^{1/2}. $$

From an application of (4), we have $ (\mathbf {a}^\prime \varvec{\varPsi }^{1/2} \varvec{\delta })^2 + \mathbf {a}^\prime \varvec{\varGamma } \mathbf {a} = \mathbf {a}^\prime \varvec{\varPsi } \mathbf {a}$, implying

$$\begin{aligned} \mathbf {a}^\prime \varvec{\varPsi }^{1/2} \varvec{\delta } = (\mathbf {a}^\prime \varvec{\varPsi } \mathbf {a})^{1/2} \delta ^*, ~~ (\mathbf {a}^\prime \varvec{\varGamma } \mathbf {a})^{1/2} = (\mathbf {a}^\prime \varvec{\varPsi } \mathbf {a})^{1/2} (1-\delta ^{*2})^{1/2}. \end{aligned}$$

This allows us to write

$$\begin{aligned} \mathbf {a}^\prime \mathbf {Y} \overset{d}{=} \mathbf {a}^\prime \varvec{\beta } + (\mathbf {a}^\prime \varvec{\varPsi } \mathbf {a})^{1/2} \delta ^* |G_1^*| + (\mathbf {a}^\prime \varvec{\varPsi } \mathbf {a})^{1/2} (1-\delta ^{*2})^{1/2} G^*. \end{aligned}$$

Now the result follows from the representation (A.5) for the univariate case.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sengupta, D., Choudhary, P.K., Cassey, P. (2015). Modeling and Analysis of Method Comparison Data with Skewness and Heavy Tails. In: Choudhary, P., Nagaraja, C., Ng, H. (eds) Ordered Data Analysis, Modeling and Health Research Methods. Springer Proceedings in Mathematics & Statistics, vol 149. Springer, Cham. https://doi.org/10.1007/978-3-319-25433-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-25433-3_11
Published: 15 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25431-9
Online ISBN: 978-3-319-25433-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Modeling and Analysis of Method Comparison Data with Skewness and Heavy Tails

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 A.1 Definitions

Definition 1

Definition 2

Definition 3

1.2 A.2 Hierarchical Representation for a GST Mixed Model

1.3 A.3 Linear Combination of Skew-Normals

Proposition 1

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation