Abstract
Recently, it is becoming more active to apply appropriate statistical methods dealing with missing data in clinical trials. Under not missing at random missingness, MLE based on direct-likelihood, or observed likelihood, possibly has a serious bias. A solution to the bias problem is to add auxiliary variables such as surrogate endpoints to the model for the purpose of reducing the bias. We theoretically studied the impact of an auxiliary variable on MLE and evaluated the bias reduction or inflation in the case of several typical correlation structures.
Similar content being viewed by others
References
Albert, P. S., Follmann, D. A. (2009). Shared-parameter models. In G. Fitzmaurice, M. Davidian, G. Verbeke, G. Molenberghs (Eds.), Longitudinal data analysis (pp. 433–452). Boca Raton, FL: Chapman & Hall/CRC Press.
Anderson, T. W. (1957). Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. Journal of American Statistical Association, 52(278), 200–203.
Dawid, A. P. (1979). Conditional independence in statistical theory. Journal of the Royal Statistical Society Series B (Methodological), 41(1), 1–31.
Finkelstein, D., Shoenfeld, D. (1994). Analysing survival in the presense of an auxiliary variable. Statistics in Medicine, 13, 1747–1754.
Fleming, T. R., DeMets, D. L. (1996). Surrogate end points in clinical trials: Are we being misled? Annals of Internal Medicine, 125, 605–613.
Fleming, T. R., Prentice, R. L., Pepe, M. S., Glidden, D. (1994). Surrogate and auxiliary endpoints in clinical trials, with potential applications in cancer and AIDS research. Statistics in Medicine, 13, 955–968.
Follmann, D., Wu, M. (1995). An approximate generalized linear model with random effects for informative missing data. Biometrics, 51, 151–168.
Ibrahim, J. G., Lipsitz, S. R., Horton, N. (2001). Using auxiliary data for parameter estimation with non-ignorability missing outcomes. Applied Statistics, 50(3), 361–373.
International Conference on Harmonisation E9 Expert Working Group. (1999). Statistical principles for clinical trials: ICH Harmonised Tripartite Guideline. Statistics in Medicine, 18, 955–968.
Kano, Y. (2015). Developments in multivariate missing data analysis. A paper presented at International Meeting of the Psychometric Society (IMPS2015). Peking, China.
Lauritzen, S. L. (1996). Graphical models. Oxford: Clarendon Press.
Li, Y., Taylor, J. M. G., Little, R. J. A. (2011). A shrinkage approach for estimating a treatment effect using intermediate biomarker data in clinical trials. Biometrics, 67, 1434–1441.
Little, R. J. A., Rubin, D. B. (2002). Statistical analysis with missing data. New York: Wiley.
Mallinckrodt, C. H., Clark, W. S., David, S. R. (2001). Accounting for dropout bias using mixed-effects models. Journal of Biopharmaceutical Statistics, 11(1&2), 9–21.
National Research Council. (2010). The prevention and treatment of missing data in clinical trials (Panel on Handling Missing Data in Clinical Trials, Committee on National Statistics, Division of Behavioral and Social Sciences and Education). Washington, DC: The National Academies Press.
O’Neill, R. T., Temple, R. (2012). The prevention and treatment of missing data in clinical trials: An FDA perspective on the importance of dealing with it. Clinical Pharmacology and Therapeutics, 91(3), 550–554.
Pharmacological Therapy for Macular Degeneration Study Group. (1997). Interferon alpha-2a is ineffective for patients with choroidal neovascularization secondary to age-related macular degeneration. Archives of Ophthalmology, 115(7), 865–872.
Prentice, R. L. (1989). Surrogate endpoints in clinical trials: Definition and operational criteria. Statistics in Medicine, 8, 431–440.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592.
Takai, K., Kano, Y. (2013). Asymptotic inference with incomplete data. Communications in Statistics—Theory and Methods, 42(17), 3174–3190.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: MLE based on direct-likelihood with \(Y_a\)
We shall obtain MLE based on direct-likelihood with \(Y_a\) according to Anderson (1957) in which MLE is more easily derived by using characteristics of normal distribution and reparametrization.
Assuming that \((X,Y,Y_a)\) have a normal distribution in (5), the conditional distribution of Y given \((X,Y_a)\) follows a normal distribution with mean \(\beta _0+\beta _{x} X + \beta _{y_a} Y_a\) and variance \(\sigma ^2_e\), where:
Hence, the direct-likelihood \(DL_+\) can be rewritten by the reparametrization as follows:
MLEs of \(\mu _x, \mu _{y_a}, \sigma _{xx}, \sigma _{xy_a}, \sigma _{y_ay_a}\) are easily obtained from the second factor of the \(DL_+\) as follows:
The MLE of \(\theta _{a1}\), \(\theta _{a2}\), and \(\theta _{a3}\) will easily be found from the results.
MLEs of the remaining parameters of \(\beta _0, \beta _x,\beta _{y_a}, \sigma ^2_{e}\) are obtained from the first factor of \(DL_+\) from the standard results on the linear regression as follows:
where
It follows from (31) and the relationship in parameters between (5) and (6) that
Hence, we obtain MLE of these parameters as follows, which means the results in Proposition 2:
Appendix B: Limit of statistics using complete cases only
Here, we shall show the following convergences for limits of \(\bar{X}_{(m)}\) and \(S_{xx_{(m)}}\) as n tends to infinity. The limits of the other statistics also have the same properties.
Assuming that \((X,Y,Y_a,Z)\) have a normal distribution in addition to (5), where the mean and variance of Z are \(\mu _z\) and \(\sigma _{zz}\), respectively, and covariance between Z and \((X,Y,Y_a)\) is \((\sigma _{zx}, \sigma _{zy}, \sigma _{zy_a})\).
Using the response indicator \(R_Y\), \(\bar{X}_{(m)}\) is expressed in the form:
By the weak law of large numbers, we obtain
By using the condition \(X \mathop {\perp \!\!\!\!\perp }R_Y |Z\), we obtain (33) shown as follows:
For \(S_{xx_{(m)}}\), we can rewrite using response indicator \(R_Y\) as follows:
By applying the weak law of large numbers,
By noting that \(X \mathop {\perp \!\!\!\!\perp }R_Y|Z\), we can evaluate the third term as follows:
The first term is written as follows:
The second term is written by using (35) as follows:
Hence, we finally obtain:
Similar derivations have been used in Kano (2015).
About this article
Cite this article
Takagi, Y., Kano, Y. Bias reduction using surrogate endpoints as auxiliary variables. Ann Inst Stat Math 71, 837–852 (2019). https://doi.org/10.1007/s10463-018-0667-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-018-0667-8