Modeling strategies to improve parameter estimates in prognostic factors analyses with patient-reported outcomes in oncology
The inclusion of patient-reported outcome (PRO) questionnaires in prognostic factor analyses in oncology has substantially increased in recent years. We performed a simulation study to compare the performances of four different modeling strategies in estimating the prognostic impact of multiple collinear scales from PRO questionnaires.
We generated multiple scenarios describing survival data with different sample sizes, event rates and degrees of multicollinearity among five PRO scales. We used the Cox proportional hazards (PH) model to estimate the hazard ratios (HR) using automatic selection procedures, which were based on either the likelihood ratio-test (Cox-PV) or the Akaike Information Criterion (Cox-AIC). We also used Cox PH models which included all variables and were either penalized using the Ridge regression (Cox-R) or were estimated as usual (Cox-Full). For each scenario, we simulated 1000 independent datasets and compared the average outcomes of all methods.
The Cox-R showed similar or better performances with respect to the other methods, particularly in scenarios with medium–high multicollinearity (ρ = 0.4 to ρ = 0.8) and small sample sizes (n = 100). Overall, the Cox-PV and Cox-AIC performed worse, for example they did not select one or more prognostic collinear PRO scales in some scenarios. Compared with the Cox-Full, the Cox-R provided HR estimates with similar bias patterns but smaller root-mean-squared errors, particularly in higher multicollinearity scenarios.
Our findings suggest that the Cox-R is the best approach when performing prognostic factor analyses with multiple and collinear PRO scales, particularly in situations of high multicollinearity, small sample sizes and low event rates.
KeywordsHealth-related quality of life Multicollinearity Patient-reported outcomes Prognostic factor analysis Ridge regression
FC, FE: Conception and design, FC, ND, FE: Statistical analyses, all authors: Interpretation of results, all authors: Manuscript writing.
Compliance with ethical standards
Conflict of interest
No potential conflict of interest for this paper was reported by the authors.
- 1.Gotay, C. C., Kawamoto, C. T., Bottomley, A., & Efficace, F. (2008). The prognostic significance of patient-reported outcomes in cancer clinical trials. Journal of Clinical Oncology, 26(8), 1355–1363.Google Scholar
- 2.Secord, A. A., Coleman, R. L., Havrilesky, L. J., Abernethy, A. P., Samsa, G. P., & CELLA, D. (2015). Patient-reported outcomes as end points and outcome indicators in solid tumours. Nature Reviews Clinical oncology, 12(6), 358–370.Google Scholar
- 3.Efficace, F., Gaidano, G., Breccia, M., Voso, M. T., Cottone, F., Angelucci, E., et al. (2015). Prognostic value of self-reported fatigue on overall survival in patients with myelodysplastic syndromes: A multicentre, prospective, observational, cohort study. The Lancet Oncology, 16(15), 1506–1514.Google Scholar
- 4.Efficace, F., Bottomley, A., Coens, C., Van Steen, K., Conroy, T., Schoffski, P., et al. (2006). Does a patient’s self-reported health-related quality of life predict survival beyond key biomedical data in advanced colorectal cancer? European Journal of Cancer, 42(1), 42–49.Google Scholar
- 5.Quinten, C., Martinelli, F., Coens, C., Sprangers, M. A., Ringash, J., Gotay, C., et al. (2014). A global analysis of multitrial data investigating quality of life and symptoms as prognostic factors for survival in different tumor sites. Cancer, 120(2), 302–311.Google Scholar
- 6.Efficace, F., Biganzoli, L., Piccart, M., Coens, C., Van Steen, K., Cufer, T., et al. (2004). Baseline health-related quality-of-life data as prognostic factors in a phase III multicentre study of women with metastatic breast cancer. European Journal of Cancer, 40(7), 1021–1030.Google Scholar
- 7.Maisey, N. R., Norman, A., Watson, M., Allen, M. J., Hill, M. E., & Cunningham, D. (2002). Baseline quality of life predicts survival in patients with advanced colorectal cancer. European Journal of Cancer, 38(10), 1351–1357.Google Scholar
- 8.Efficace, F., Innominato, P. F., Bjarnason, G., Coens, C., Humblet, Y., Tumolo, S., et al. (2008). Validation of patient’s self-reported social functioning as an independent prognostic factor for survival in metastatic colorectal cancer patients: results of an international study by the Chronotherapy Group of the European Organisation for Research and Treatment of Cancer. Journal of Clinical Oncology, 26(12), 2020–2026.Google Scholar
- 9.Fang, F. M., Tsai, W. L., Chiu, H. C., Kuo, W. R., & Hsiung, C. Y. (2004). Quality of life as a survival predictor for esophageal squamous cell carcinoma treated with radiotherapy. International Journal of Radiation Oncology, Biology, Physics, 58(5), 1394–1404.Google Scholar
- 10.Chau, I., Norman, A. R., Cunningham, D., Waters, J. S., Oates, J., & Ross, P. J. (2004). Multivariate prognostic factor analysis in locally advanced and metastatic esophago-gastric cancer–pooled analysis from three multicenter, randomized, controlled trials using individual patient data. Journal of Clinical Oncology, 22(12), 2395–2403.Google Scholar
- 11.de Graeff, A., de Leeuw, J. R., Ros, W. J., Hordijk, G. J., Blijham, G. H., & Winnubst, J. A. (2001). Sociodemographic factors and quality of life as prognostic indicators in head and neck cancer. European Journal of Cancer, 37(3), 332–339.Google Scholar
- 12.Chiarion-Sileni, V., Del Bianco, P., De Salvo, G. L., Lo Re, G., Romanini, A., Labianca, R., et al. (2003). Quality of life evaluation in a randomised trial of chemotherapy versus bio-chemotherapy in advanced melanoma patients. European Journal of Cancer, 39(11), 1577–1585.Google Scholar
- 13.Dubois, D., Dhawan, R., van de Velde, H., Esseltine, D., Gupta, S., Viala, M., et al. (2006). Descriptive and prognostic value of patient-reported outcomes: the bortezomib experience in relapsed and refractory multiple myeloma. Journal of Clinical Oncology, 24(6), 976–982.Google Scholar
- 14.Eton, D. T., Fairclough, D. L., Cella, D., Yount, S. E., Bonomi, P., & Johnson, D. H. (2003). Early change in patient-reported health during lung cancer chemotherapy predicts clinical outcomes beyond those predicted by baseline report: Results from Eastern Cooperative Oncology Group Study 5592. Journal of Clinical Oncology, 21(8), 1536–1543.Google Scholar
- 15.Bottomley, A., Coens, C., Efficace, F., Gaafar, R., Manegold, C., Burgers, S., et al. (2007). Symptoms and patient-reported well-being: Do they predict survival in malignant pleural mesothelioma? A prognostic factor analysis of EORTC-NCIC 08983: Randomized phase III study of cisplatin with or without raltitrexed in patients with malignant pleural mesothelioma. Journal of Clinical Oncology, 25(36), 5770–5776.Google Scholar
- 16.Cella, D., Traina, S., Li, T., Johnson, K., Ho, K. F., Molina, A., et al. (2018). Relationship between patient-reported outcomes and clinical outcomes in metastatic castration-resistant prostate cancer: post hoc analysis of COU-AA-301 and COU-AA-302. Annals of Oncology, 29(2), 392–397.Google Scholar
- 17.Movsas, B., Hu, C., Sloan, J., Bradley, J., Komaki, R., Masters, G., et al. (2016). Quality of life analysis of a radiation dose-escalation study of patients with non-small-cell lung cancer: A secondary analysis of the radiation therapy oncology group 0617 randomized clinical trial. JAMA Oncology, 2(3), 359–367.Google Scholar
- 18.Mauer, M., Bottomley, A., Coens, C., & Gotay, C. (2008). Prognostic factor analysis of health-related quality of life data in cancer: A statistical methodological evaluation. Expert Review of Pharmacoeconomics & Outcomes Research, 8(2), 179–196.Google Scholar
- 19.Van Steen, K., Curran, D., Kramer, J., Molenberghs, G., Van Vreckem, A., Bottomley, A., et al. (2002). Multicollinearity in prognostic factor analyses using the EORTC QLQ-C30: identification and impact on model selection. Statistics in Medicine, 21(24), 3865–3884.Google Scholar
- 20.Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull, A., Duez, N. J., et al. (1993). The european organization for research and treatment of cancer QLQ-C30: A quality-of-life instrument for use in international clinical trials in oncology. Journal of the National Cancer Institute, 85(5), 365–376.Google Scholar
- 21.Cramer, E. M. (1985). Multicollinearity. In S. Kotz, N. L. Johnson & C. B. Read (Eds.), Encyclopedia of statistical sciences. (Vol. 2, pp. 639–643). New York, Wiley.Google Scholar
- 22.Slinker, B. K., & Glantz, S. A. (1985). Multiple regression for physiological data analysis: The problem of multicollinearity. The American Journal of Physiology, 249(1 Pt 2), R1–R12.Google Scholar
- 23.Sithisarankul, P., Weaver, V. M., Diener-West, M., & Strickland, P. T. (1997). Multicollinearity may lead to artificial interaction: An example from a cross sectional study of biomarkers. The Southeast Asian Journal of Tropical Medicine and Public Health, 28(2), 404–409.Google Scholar
- 24.Ediebah, D. E., Coens, C., Zikos, E., Quinten, C., Ringash, J., King, M. T., et al. (2014). Does change in health-related quality of life score predict survival? Analysis of EORTC 08975 lung cancer trial. British Journal of Cancer, 110(10), 2427–2433.Google Scholar
- 25.Staren, E. D., Gupta, D., & Braun, D. P. (2011). The prognostic role of quality of life assessment in breast cancer. The Breast Journal, 17(6), 571–578.Google Scholar
- 26.Harrell, f. e. jr., Lee, K. L., Matchar, D. B., & Reichert, T. A. (1985). Regression models for prognostic prediction: Advantages, problems, and suggested solutions. Cancer Treatment Reports, 69(10), 1071–1077.Google Scholar
- 27.Harrell, F. E. (2015). Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis. Cham: Springer.Google Scholar
- 28.Simon, R., & Altman, D. G. (1994). Statistical aspects of prognostic factor studies in oncology. British journal of cancer, 69(6), 979–985.Google Scholar
- 29.Cohen, J. (2003). Applied multiple regression/correlation analysis for the behavioral sciences. Mahwah: Lawrence Erlbaum Associates Publishers.Google Scholar
- 30.Hoerl, A. E., & Kennard, R. W. (2000). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 42(1), 80–86.Google Scholar
- 31.Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov, F. Csaki (Ed.), Second international symposium on information theory (pp. 267–281): Budapest: Akademai Kiado.Google Scholar
- 32.Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning: data mining, inference, and prediction. New York: Springer.Google Scholar
- 33.Fayers, P., Aaronson, N. K., Bjordal, K., Groenvold, M., Curran, D., & Bottomley, A. on behalf of the EORTC Quality of Life Group. (2001). The EORTC QLQ-C30 Scoring Manual (3rd Edn). European Organisation for Research and Treatment of Cancer, Brussels.Google Scholar
- 34.Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.Google Scholar
- 35.Lee, E. T., & Go, O. T. (1997). Survival analysis in public health research. Annual Review of Public Health, 18, 105–134.Google Scholar
- 36.Bender, R., Augustin, T., & Blettner, M. (2005). Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine, 24(11), 1713–1723.Google Scholar
- 37.Altman, D. G., & Andersen, P. K. (1989). Bootstrap investigation of the stability of a Cox regression model. Statistics in Medicine, 8(7), 771–783.Google Scholar
- 38.Sauerbrei, W., Boulesteix, A. L., & Binder, H. (2011). Stability investigations of multivariable regression models derived from low- and high-dimensional data. Journal of Biopharmaceutical Statistics, 21(6), 1206–1231.Google Scholar
- 39.Efron, B. (1977). The efficiency of Cox’s likelihood function for censored data. Journal of the American Statistical Association, 72, 557–565.Google Scholar
- 40.Team, R. C. (2016). R: A language and environment for statistical computing. https://www.R-project.org/.
- 41.Morozova, O., Levina, O., Uuskula, A., & Heimer, R. (2015). Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia. BMC Medical Research Methodology, 15, 71.Google Scholar
- 42.Steyerberg, E. W., Eijkemans, M. J., Harrell, F. E. Jr., & Habbema, J. D. (2000). Prognostic modelling with logistic regression analysis: A comparison of selection and estimation methods in small data sets. Statistics in Medicine, 19(8), 1059–1079.Google Scholar
- 43.Yoo, W., Mayberry, R., Bae, S., Singh, K., He, P., Q., & Lillard, J. W. Jr. (2014). A study of effects of multicollinearity in the multivariable analysis. International Journal of Applied Science and Technology, 4(5), 9–19.Google Scholar
- 44.Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., et al. (2013). Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46.Google Scholar
- 45.Xue, X., Kim, M. Y., & Shore, R. E. (2007). Cox regression analysis in presence of collinearity: An application to assessment of health risks associated with occupational radiation exposure. Lifetime Data Analysis, 13(3), 333–350.Google Scholar
- 46.Sauerbrei, W., & Schumacher, M. (1992). A bootstrap resampling procedure for model building: Application to the Cox regression model. Statistics in Medicine, 11(16), 2093–2109.Google Scholar
- 47.Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., & Feinstein, A. R. (1996). A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology, 49(12), 1373–1379.Google Scholar
- 48.Harrell, F. E. Jr., Lee, K. L., Califf, R. M., Pryor, D. B., & Rosati, R. A. (1984). Regression modelling strategies for improved prognostic prediction. Statistics in Medicine, 3(2), 143–152.Google Scholar