Top ten errors of statistical analysis in observational studies for cancer research
Observational studies using registry data make it possible to compile quality information and can surpass clinical trials in some contexts. However, data heterogeneity, analytical complexity, and the diversity of aspects to be taken into account when interpreting results makes it easy for mistakes to be made and calls for mastery of statistical methodology. Some questionable research practices that include poor analytical data management are responsible for the low reproducibility of some results; yet, there is a paucity of information in the literature regarding specific statistical pitfalls of cancer studies. In addition to proposing how to avoid or solve them, this article seeks to expose ten common problematic situations in the analysis of cancer registries: convenience, dichotomization, stratification, regression to the mean, impact of sample size, competing risks, immortal time and survivor bias, management of missing values, and data dredging.
KeywordsCancer research Error Observational studies Pitfalls Registry Statistical analysis
Priscilla Chase Duran is acknowledged for editing the manuscript.
Compliance with ethical standards
Conflict of interest
None to declare. This is an academic study. No financial support has been received from external sources.
The study has been performed in accordance with the ethical standards of the Declaration of Helsinki and its later amendments.
- 1.Garcia-Albeniz X, Chan JM, Paciorek AT, Logan RW, Kenfield SA, Cooperberg MR, et al. Immediate versus deferred initiation of androgen deprivation therapy in prostate cancer patients with PSA-only relapse. J Clin Oncol. 2014;32(15):817–24.Google Scholar
- 12.Bland M. An introduction to medical statistics. 4th ed. Oxford: Oxford University Press; 2015.Google Scholar
- 13.Kirkwood BR, Sterne JAC. Essential medical statistics. Massachusetts: Wiley; 2010.Google Scholar
- 14.Petrie A, Sabin C. Medical statistics at a glance. 3rd ed. Chichester: Wiley; 2013.Google Scholar
- 21.Chaudhury A, Balakrishnan A, Thai C, Holmstrom B, Nanjappa S, Ma Z, et al. Validation of the khorana score in a large cohort of cancer patients with venous thromboembolism. Blood. 2016;128(2):879.Google Scholar
- 28.Jiménez-Fonseca P, Carmona-Bayonas A, Hernández R, Custodio A, Cano JM, Lacalle A, et al. Lauren subtypes of advanced gastric cancer influence survival and response to chemotherapy: real-World Data from the AGAMENON National Cancer Registry. Br J Cancer. 2017;117(6):775–82.CrossRefPubMedPubMedCentralGoogle Scholar
- 66.Raboud JM, Montaner JSG, Thorne A, Singer J, Schechter MT. Group CHIVTNAS. Impact of missing data due to dropouts on estimates of the treatment effect in a randomized trial of antiretroviral therapy for HIV-infected individuals. JAIDS J Acquir Immune Defic Syndr. 1996;12(1):46–55.CrossRefGoogle Scholar
- 72.Sterling TD. Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. J Am Stat Assoc. 1959;54(285):30–4.Google Scholar