Regression Calibration for Cox Regression Under Heteroscedastic Measurement Error — Determining Risk Factors of Cardiovascular Diseases from Error-prone Nutritional Replication Data

  • Thomas Augustin
  • Angela Döring
  • David Rummel

A widespread problem in applying regression analysis is the presence of data deficiency. In most surveys a not negligible proportion of data is missing, and sophisticated methods are needed to avoid severely biased estimation. Reviews on this important topic are provided, in particular, by Rao et al. (2008, Chapter 8), Little and Rubin (2002), Toutenburg et al. (2002) and Toutenburg, Fieger and Heumann (2000). Recent developments include, for instance, Toutenburg and Srivastava (1999) and Toutenburg and Srivastava (2004), who discuss corrected estimation of population characteristics from partially incomplete survey data. Toutenburg and Shalabh (2001), Heumann (2004), Shalabh and Toutenburg (2005), Toutenburg and Shalabh (2005), Toutenburg et al. (2006), Toutenburg et al. (2005) and Toutenburg, Srivastava and Shalabh (2006) provide neat methods for handling missing data in linear and nonlinear regression models, while, among others, Strobl, Boulesteix and Augustin (2007) and Svejdar et al. (2007) are concerned with classification under missing data.

The paper is organized as follows: The next section describes our modeling of the replication data. Section 3 adapts the idea of regression calibration to replication data and to quadratic predictors. The application to the MONICA data is reported in Section 4, while Section 5 concludes by sketching some topics for further research.


Protein Intake Replication Data Measurement Error Model Accelerate Failure Time Model Regression Calibration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Ahmad A (2006) Statistical analysis of heaping and rounding effects. Dr Hut. MunichGoogle Scholar
  2. Andersen PK, Liestol K (2003) Attenuation caused by infrequently updated covariates in survival analysis. Biostatistics 4: 633-649MATHCrossRefGoogle Scholar
  3. Augustin T (2002) Survival analysis under measurement error. Habilitation (post-doctorial) thesis, Faculty of Mathematics, Informatics and Statistics, University of Munich, MunichGoogle Scholar
  4. Augustin T (2004) An exact corrected log-likelihood function for Cox’s proportional hazards model under measurement error and some ex-tensions. Scandinavian Journal of Statistics 31: 43-50MATHCrossRefMathSciNetGoogle Scholar
  5. Augustin T, Schwarz R (2002) Cox’s proportional hazards model under covariate measurement error — A review and comparison of methods. In: Van Huffel S, Lemmerling P (eds) Total least squares and errors-in-variables modeling: analysis, algorithms and applications. Kluwer, Dordrecht: 175-184Google Scholar
  6. Augustin T, Wolff J (2004) A bias analysis of Weibull models under heaped data. Statistical Papers 45: 211-229MATHCrossRefMathSciNetGoogle Scholar
  7. Bender R, Augustin T, Bletter M (2005) Simulating survival times for Cox regression models. Statistics in Medicine 24: 1713-1723CrossRefMathSciNetGoogle Scholar
  8. de Bruijne MHJ, le Cessie S, Kluin-Neemans HC, van Houwelingen HC (2001) On the use of Cox regression in the presence of an irregularly observed time-dependent covariate. Statistics in Medicine 20: 3817-3829CrossRefGoogle Scholar
  9. Buzas JS (1998) Unbiased scores in proportional hazards regression with covariate measurement error. Journal of Statistical Planning and Inference 67: 247-257MATHCrossRefMathSciNetGoogle Scholar
  10. Carroll RJ, Freedman LS, Kipnis V, Li L (1998) A new class of measurement error models, with applications to dietary data. Canadian Journal of Statistics 26: 467-477MATHCrossRefGoogle Scholar
  11. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective, 2nd edition. Chapman and Hall/CRC. Boca Raton, FL, USAMATHGoogle Scholar
  12. Cheng SC, Wang NY (2001) Linear transformation models for failure time data with covariate measurement error. Journal of the American Statistical Association 96: 706-716MATHCrossRefMathSciNetGoogle Scholar
  13. Cheng C-L, Schneeweiß H (1998) The polynomial regression with errors in the variables. Journal of the Royal Statistical Society Series B 60: 189-199MATHCrossRefGoogle Scholar
  14. Cheng C-L, Schneeweiss H, Thamerus M (2000) A small sample estimator for a polynomial regression with errors in the variables. Journal of the Royal Statistical Society B 62: 699-709MATHCrossRefMathSciNetGoogle Scholar
  15. Cheng C-L, Van Ness JW (1999) Statistical regression with measurement error. Arnold, LondonMATHGoogle Scholar
  16. Clayton DG (1991) Models for the analysis of cohort and case-control studies with inaccurately measured exposures. In: Dwyer JH, Feinleib M, Lipsert P et al. (eds) Statistical Models for Longitudinal Studies of Health. Oxford University Press, New York: 301-331Google Scholar
  17. Cox DR (1972) Regression models and life tables (with discussion). Journal of the Royal Statistical Society Series B 34: 187-220MATHGoogle Scholar
  18. Döring A, Kußmaul B. (1997) Ernährungsdeterminanten des Herzinfarktrisikos. Report GSF-Fe-7629. GSF — National Research Center for Environment and Health, NeuherbergGoogle Scholar
  19. Dupuy JF (2005) The proportional hazards model with covariate measurement error. Journal of Statistical Planning and Inference 135: 260-275MATHCrossRefMathSciNetGoogle Scholar
  20. Giménez P, Bolfarine H, Colosimo EA (1999) Estimation in Weibull regression model with measurement error. Communications in Statistics — Theory and Methods 28: 495-510MATHCrossRefGoogle Scholar
  21. Giménez P, Bolfarine H, Colosimo EA (2006) Asymptotic relative efficiency of score tests in Weibull models with measurement errors, Statistical Papers 47: 461-470.MATHCrossRefMathSciNetGoogle Scholar
  22. He W, Yi G, Xiong J. (2007) Accelerated failure time models with covariates subject to measurement error. Statistics in Medicine (Appearing)Google Scholar
  23. Heid IM, Kchenhoff H, Rosario AS, Kreienbrock L and Wichmann HE (2006) Impact of measurement error in exposures in German radon studies. Journal of Toxicology and Environmental Health A 69: 701-721CrossRefGoogle Scholar
  24. Heitmann BL, Lissner L (1995) Dietary underreporting by obese individuals — is it specific or non-specific? British Medical Journal 311: 986-989Google Scholar
  25. Heumann C (2004) Monte Carlo methods for missing data in generalized linear and generalized linear mixed models. Habilitation thesis, Faculty of Mathematics, Informatics and Statistics, University of Munich, MunichGoogle Scholar
  26. Horowitz J, Manski CF (2000) Nonparametric analysis of randomized experiments with missing covariate and outcome data. Journal of the American Statistical Association, 95: 77-84MATHCrossRefMathSciNetGoogle Scholar
  27. Hu C, Lin DY (2002) Cox regression with covariate measurement error. Scandinavian Journal of Statistics 29: 637-655MATHCrossRefMathSciNetGoogle Scholar
  28. Hu CC, Lin DY (2004) Semiparametric failure time regression with replicates of mismeasured covariates. Journal of the American Statistical Association 99: 105-118MATHCrossRefMathSciNetGoogle Scholar
  29. Hu P, Tsiatis A, Davidian M (1998) Estimating the parameters in the Cox model when covariate variables are measured with error. Biometrics 54: 1407-1419MATHCrossRefMathSciNetGoogle Scholar
  30. Huang HS, Huwang L (2001) On the polynomial structural relationship. The Canadian Journal of Statistics 29: 493-511CrossRefMathSciNetGoogle Scholar
  31. Huang Y, Wang CY (2000) Cox regression with accurate covariates unascertainable: a nonparametric-correction approach. Journal of the American Statistical Association 95: 1209-1219MATHCrossRefMathSciNetGoogle Scholar
  32. Huang YJ, Wang CY (2006) Errors-in-covariates effect on estimating functions: Additivity in limit and nonparametric correction. Statistica Sinica 16: 861-881MATHMathSciNetGoogle Scholar
  33. Hughes MD (1993) Regression dilution in the proportional hazards model. Biometrics 49: 1056-1066MATHCrossRefMathSciNetGoogle Scholar
  34. Johansson I, Hallmans G, Wikman A, Biessy C, Riboli E, Kaaks R (2002) Validation and calibration of food-frequency questionnaire measurement in the Northern Sweden health and disease cohort. Public Health Nutrition 5: 487-496CrossRefGoogle Scholar
  35. Kalbfleisch JD, Prentice RL (2002) The Statistical analysis of failure time data. 2nd edition. Wiley, New YorkMATHGoogle Scholar
  36. Kong FH (1999) Adjusting regression attenuation in the Cox proportional hazards model. Journal of Statistical Planning and Inference 79: 31-44MATHCrossRefMathSciNetGoogle Scholar
  37. Kong FH, Gu M (1999) Consistent estimation in Cox proportional hazards model with covariate measurement errors. Statistica Sinica 9: 953-969MATHMathSciNetGoogle Scholar
  38. Küchenhoff H, Lederer W, Lesaffre E (2007) Asymptotic variance estimation for the misclassification SIMEX. Computational Statistics & Data Analysis 51: 6197-6211MATHCrossRefMathSciNetGoogle Scholar
  39. Küchenhoff H, Mwalili SM, Lesaffre E (2006) A general method for dealing with misclassification in regression: The misclassification SIMEX. Biometrics 62: 85-96MATHCrossRefMathSciNetGoogle Scholar
  40. Kuha JT, Temple J (2003) Covariate measurement error in quadratic regression. International Statistical Review 71: 131-150MATHGoogle Scholar
  41. Kukush A, Markovsky I, Van Huffel S (2004) Consistent estimation in an implicit quadratic measurement error model. Computational Statistics & Data Analysis 47: 123-147MATHCrossRefMathSciNetGoogle Scholar
  42. Kukush A, Schneeweiss H, Wolf R (2004) Three estimators for the Poisson regression model with measurement errors. Statistical Papers 45: 351-368MATHCrossRefMathSciNetGoogle Scholar
  43. Kukush A, Schneeweiss H (2005) Relative efficiency of three estimators in a polynomial regression with measurement errors. Journal of Statistical Planning and Inference 127: 179-203MATHCrossRefMathSciNetGoogle Scholar
  44. Kukush A, Schneeweiss H (2005) Comparing different estimators in a nonlinear measurement error model. I and II. Mathematical Methods of Statistics 14: 53-79 and 203-223MathSciNetGoogle Scholar
  45. Kulathinal SB, Kuulasmaa K, Gasbarra D (2002) Estimation of an errors-in-variables regression model when the variances of the mea-surement errors vary between the observations. Statistics in Medicine 21: 1089-1101CrossRefGoogle Scholar
  46. Kuznetsov VP (1991) Interval statistical models (in Russian), Radio and Communication, MoscowGoogle Scholar
  47. Li Y, Lin XH (2003) Functional inference in frailty measurement error models for clustered survival data using the SIMEX approach. Journal of the American Statistical Association 98: 191-203MATHCrossRefMathSciNetGoogle Scholar
  48. Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edition. Wiley, New YorkMATHGoogle Scholar
  49. Liu K, Stone RA, Mazumdar S, Houck PR, Reynolds CF (2004) Covari-ate measurement error in the Cox model: A simulation study. Com-munications in Statistics - Simulation and Computation 33: 1077-1093MATHCrossRefMathSciNetGoogle Scholar
  50. Manski CF (2003) Partial identification of probability distributions. Springer, New YorkMATHGoogle Scholar
  51. Manski CF, Tamer E (2002) Inference on regressions with interval dara on a regressor or outcome. Econometrica 70: 519-546MATHCrossRefMathSciNetGoogle Scholar
  52. Martin-Magniette M-L, Taupin M-L (2006) Semi-parametric estimation of the hazard function in a model with covariate measurement error. Arxiv preprint math.ST/0606192.Google Scholar
  53. Nakamura T (1990) Corrected score functions for errors-in-variables models: Methodology and application to generalized linear models. Biometrika 77: 127-137MATHCrossRefMathSciNetGoogle Scholar
  54. Nakamura T (1992) Proportional hazards model with covariates subject to measurement error. Biometrics 48: 829-838CrossRefMathSciNetGoogle Scholar
  55. Pepe MS, Self SG, Prentice RL (1989) Further results in covariate measurement errors in cohort studies with time to response data. Statistics in Medicine 8: 1167-1178CrossRefGoogle Scholar
  56. Prentice RL (1982) Covariate measurement errors and parameter esti-mation in a failure time regression model. Biometrika 69: 331-342MATHCrossRefMathSciNetGoogle Scholar
  57. Prentice RL (1996) Measurement error and results from analytic epi-demiology: dietary fat and breast cancer. Journal of the National Cancer Institute 88: 1738-1747CrossRefGoogle Scholar
  58. Rao CR, Toutenburg H, Shalabh, Heumann C (2008) Linear Models: Least Squares and Generalizations (3rd edition). Springer, Berlin Heidelberg New YorkMATHGoogle Scholar
  59. Ronning G (2005) Randomized response and the binary probit model. Economics Letters 86: 221-228CrossRefMathSciNetGoogle Scholar
  60. Rummel D (2006) Correction for covariate measurement error in nonparametric regression. Dr Hut. MunichGoogle Scholar
  61. Rummel D, Augustin T, Küchenhoff H (2007) Correction for covariate measurement error in nonparametric longitudinal regression. SFBDiscussion Paper 513. Department of Statistics. University of MunichGoogle Scholar
  62. Schmid M (2006) Estimation of a linear regression model with microaaggregated data. Dr Hut, MunichGoogle Scholar
  63. Schmid M, Schneeweiß H, Küchenhoff H (2007) Estimation of a linear regression under microaaggregation with the response variable as a sorting variable. Statistica Neerlandica (Appearing)Google Scholar
  64. Schneeweiß H, Augustin T (2006) Some recent advanced in measurement error models and methods. Allgemeines Statistisches Archiv - Journal of the German Statistical Association 90: 183-197 (also printed In: Hübler O, Frohn J (eds, 2006) Modern econometric analysis - survey on recent developments, 183-198)Google Scholar
  65. Schneeweiss H, Cheng C-L (2006) Bias of the structural quasi-score estimator of a measure-ment error model under misspecification of the regressor distribution. Journal of Multivariate Analysis 97: 455-473MATHCrossRefMathSciNetGoogle Scholar
  66. Shalabh (2001) Consistent estimation through weighted harmonic mean of inconsistent estimators in replicated measurement error models. Econometric Reviews. 20: 507-510MATHCrossRefMathSciNetGoogle Scholar
  67. Shalabh (2001) Least squares estimators in measurement error models under the balanced loss function. TEST 10: 301-308MATHCrossRefMathSciNetGoogle Scholar
  68. Shalabh (2003) Consistent estimation of coefficients in measurement error models with replicated observations. Journal of Multivariate Analysis. 86: 227-241MATHCrossRefMathSciNetGoogle Scholar
  69. Shalabh, Toutenburg H (2005) Estimation of linear regression model with missing data: the role of stochastic linear constraints. Commu-nications in Statistics - Theory and Methods 34: 375-387MATHCrossRefMathSciNetGoogle Scholar
  70. Shklyar S, Schneeweiss H (2005) A comparison of asymptotic covari-ance matrices of three consistent estimators in the Poisson regression model with measurement errors. Journal of Multivariate Analysis 94: 250-270MATHCrossRefMathSciNetGoogle Scholar
  71. Shklyar S, Schneeweiss H, Kukush A (2007) Quasi score is more efficient than corrected score in a polynomial measurement error model. Metrika 65: 275-295CrossRefMathSciNetGoogle Scholar
  72. Song X, Huang YJ (2006) A corrected pseudo-score approach for additive hazards model with longitudinal covariates measured with error. Lifetime Data Analysis 12: 97-110CrossRefMathSciNetGoogle Scholar
  73. Stefanski LA (1989) Unbiased estimation of a nonlinear function of a normal mean with application to measurement error models. Communications in Statistics — Theory and Methods 18: 4335-4358MATHCrossRefMathSciNetGoogle Scholar
  74. Stefanski LA (2000) Measurement error models. Journal of the American Statistical Association 95: 1353-1358MATHCrossRefMathSciNetGoogle Scholar
  75. Strobl C, Boulesteix A-L , Augustin T (2007) Unbiased split selection for classification trees based on the Gini index. Computational Statistics & Data Analysis 52: 483-501MATHCrossRefMathSciNetGoogle Scholar
  76. Svejdar V, Augustin T and Strobl C(2007) Variablense-lektioninKlassifikationsbämen unter spezieller Berck-sichtigungvonfehlendenWerten.Technicalreport (∼carolin/research.html)
  77. Toutenburg H (2002) Statistical Analysis of Designed Experiments (2nd edition). Springer, New YorkMATHGoogle Scholar
  78. Toutenburg H, Fieger A, Heumann C (2000) Regression modelling with fixed effects: missing values and related problems. In: Rao CR, Székely GJ (eds) Statistics for the 21st century. Marcel Dekker, New YorkGoogle Scholar
  79. Toutenburg H, Heumann C, Nittner T, Scheid S (2002) Parametric and nonparametric regression with missing X’s - a review. Journal of the Iranian Statistical Society 1: 79-110Google Scholar
  80. Toutenburg H, Shalabh, Heumann C (2006) Use of prior information in the form of interval constraints fir the improved estimation of linear regression models with some missing responses. Journal of Statistical Planning and Inference 136: 2430-2445MATHCrossRefMathSciNetGoogle Scholar
  81. Toutenburg H, Shalabh (2005) Estimation of regression coefficients sub-ject to exact linear restrictions when some observations are missing and quadratic error balanced loss function is used. TEST 14: 385-396MATHCrossRefMathSciNetGoogle Scholar
  82. Toutenburg H, Shalabh (2001) Use of minimum risk approach in the estimation of regression models with missing observations. Metrika 54: 247-259CrossRefMathSciNetGoogle Scholar
  83. Toutenburg H, Srivastava VK (1999) Estimation of ratio of population means in survey sampling when some observations are missing. Metrika 48: 177-187CrossRefMathSciNetGoogle Scholar
  84. Toutenburg H, Srivastava VK (2004) Efficient estimation of population mean using incomplete survey data on study and auxiliary characteristics. Statistica (Bologna) 63: 223-236MathSciNetGoogle Scholar
  85. Toutenburg H, Srivastava VK, Shalabh, Heumann C (2005) Estimation of parameters in multiple linear regression with missing covariates using a modified first order regression proce-dure. Annals of Economics and Finance 6: 289-301Google Scholar
  86. Toutenburg H, Srivastava VK, Shalabh (2006) Estimation of linear regression models with missing observations on both the explanatory and study variables. Quality Technology & Quantitative Management 3: 179-189MathSciNetGoogle Scholar
  87. Utkin LV, Augustin T (2007) Decision making under imperfect measurement using the imprecise Dirichlet model. International Journal of Approximate Reasoning 44, 332-338CrossRefMathSciNetGoogle Scholar
  88. Van Huffel S, Lemmerling P (eds, 2002) Total least squares and errors-in-variables modeling: analysis, algorithms and applications. Kluwer, DordrechtGoogle Scholar
  89. Walley P (1991) Statistical reasoning with imprecise probabilities, Chapman and Hall, LondonMATHGoogle Scholar
  90. Wang CY, Hsu L, Feng ZD, Prentice RL (1997) Regression calibration in failure time regression. Biometrics 53: 131-145MATHCrossRefMathSciNetGoogle Scholar
  91. Wang CY, Xie SX, Prentice RL (2001) Recalibration based on an approximate relative risk estimator in cox regression with missing covariates. Statistica Sinica 1: 1081-1104MathSciNetGoogle Scholar
  92. Wang Q (2000) Estimation of linear error-in-covariables models with validation data under random censorship. Journal of Multivariate Analysis 74: 245-266MATHCrossRefMathSciNetGoogle Scholar
  93. Wansbeek T, Meijer E (2000) Measurement error and latent variables in econometrics. Elsevier, AmsterdamMATHGoogle Scholar
  94. Weichselberger K (2001) Elementare Grundbegriffe einer allgemeineren Wahrscheinlichkeitsrechnung, Volume I: Intervallwahrscheinlichkeit als umfassendes Konzept, Physika, HeidelbergMATHGoogle Scholar
  95. Willett W (1998) Nutritional Epidemiology (2nd edition). Oxford University Press, New YorkGoogle Scholar
  96. Winkler G, Döring A, Keil U (1991) Selected nutrient intakes of middleaged men in Southern Germany: Results from the WHO MONICA Augsburg Dietary Survey of 1984/1985. Annals of Nutrition and Metabolism 35: 284-291CrossRefGoogle Scholar
  97. Wolff J, Augustin T (2003) Heaping and its consequences for dura-tion analysis - a simulation study. Allgemeines Statistisches Archiv -Journal of the German Statistical Association 87: 1-28MathSciNetGoogle Scholar
  98. Xie SX, Wang CY, Prentice RL (2001) A risk set calibration method for failure time regression by using a covariate reliability sample. Journal of the Royal Statistical Society Series B 63: 855-870MATHCrossRefMathSciNetGoogle Scholar
  99. Yi GY, Lawless JF (2007) A corrected likelihood method for the propor-tional hazards model with covariates subject to measurement error. Journal of Statistical Planning and Inference 137(6): 1816-1828MATHCrossRefMathSciNetGoogle Scholar
  100. Zaffalon M (2002) Exact credal treatment of missing data, Journal of Statistical Planning and Inference. 105: 105-122MATHCrossRefMathSciNetGoogle Scholar
  101. Zaffalon M (2005) Conservative rules for predictive inference with incomplete data, In: F. G. Cozman, R. Nau, T. Seidenfeld (eds.), ISIPTA ’05, Proceedings of the fourth international symposium on imprecise probabilities and their applications, Pittsburgh, PA, USA, SIPTA, Manno (CH), 406-415Google Scholar
  102. Zaffalon M, de Cooman G (2004) Updating beliefs with incomplete observations. Artificial Intelligence 159: 75-125MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Physica-Verlag Heidelberg 2008

Authors and Affiliations

  • Thomas Augustin
    • 1
  • Angela Döring
    • 2
  • David Rummel
    • 3
  1. 1.Department of StatisticsUniversity of MunichMunichGermany
  2. 2.GSF-National Research Center for Environment and HealthNeuherbergGermany
  3. 3.emnos GmbHMunichGermany

Personalised recommendations