Skip to main content

Advertisement

Log in

Top ten errors of statistical analysis in observational studies for cancer research

  • Review Article
  • Published:
Clinical and Translational Oncology Aims and scope Submit manuscript

Abstract

Observational studies using registry data make it possible to compile quality information and can surpass clinical trials in some contexts. However, data heterogeneity, analytical complexity, and the diversity of aspects to be taken into account when interpreting results makes it easy for mistakes to be made and calls for mastery of statistical methodology. Some questionable research practices that include poor analytical data management are responsible for the low reproducibility of some results; yet, there is a paucity of information in the literature regarding specific statistical pitfalls of cancer studies. In addition to proposing how to avoid or solve them, this article seeks to expose ten common problematic situations in the analysis of cancer registries: convenience, dichotomization, stratification, regression to the mean, impact of sample size, competing risks, immortal time and survivor bias, management of missing values, and data dredging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  1. Garcia-Albeniz X, Chan JM, Paciorek AT, Logan RW, Kenfield SA, Cooperberg MR, et al. Immediate versus deferred initiation of androgen deprivation therapy in prostate cancer patients with PSA-only relapse. J Clin Oncol. 2014;32(15):817–24.

    Google Scholar 

  2. Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):e124.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Jager LR, Leek JT. An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics. 2014;15(1):1–12.

    Article  PubMed  Google Scholar 

  4. John LK, Loewenstein G, Prelec D. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol Sci. 2012;23(5):524–32.

    Article  PubMed  Google Scholar 

  5. Baker M. 1500 scientists lift the lid on reproducibility. Nature. 2016;533(7604):452–4.

    Article  PubMed  CAS  Google Scholar 

  6. Suissa S. Immortal time bias in pharmaco-epidemiology. Am J Epidemiol. 2008;167(4):492–9.

    Article  PubMed  Google Scholar 

  7. Gore SM, Jones G, Thompson SG. The lancet’s statistical review process: areas for improvement by authors. Lancet. 1992;340(8811):100–2.

    Article  PubMed  CAS  Google Scholar 

  8. Goodman SN, Altman DG, George SL. Statistical reviewing policies of medical journals. J Gen Intern Med. 1998;13(11):753–6.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Fernandes-Taylor S, Hyun JK, Reeder RN, Harris AHS. Common statistical and research design problems in manuscripts submitted to high-impact medical journals. BMC Res Notes. 2011;4(1):304.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Wicherts JM, Borsboom D, Kats J, Molenaar D. The poor availability of psychological research data for reanalysis. Am Psychol. 2006;61(7):726.

    Article  PubMed  Google Scholar 

  11. Vickers AJ. Sharing raw data from clinical trials: what progress since we first asked “Whose data set is it anyway?”. Trials. 2016;17(1):227.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Bland M. An introduction to medical statistics. 4th ed. Oxford: Oxford University Press; 2015.

    Google Scholar 

  13. Kirkwood BR, Sterne JAC. Essential medical statistics. Massachusetts: Wiley; 2010.

    Google Scholar 

  14. Petrie A, Sabin C. Medical statistics at a glance. 3rd ed. Chichester: Wiley; 2013.

    Google Scholar 

  15. Carmona-Bayonas A, Font C, Fonseca PJ, Fenoy F, Otero R, Beato C, et al. On the necessity of new decision-making methods for cancer-associated, symptomatic, pulmonary embolism. Thromb Res. 2016;143:76–85.

    Article  PubMed  CAS  Google Scholar 

  16. Carmona-Bayonas A, Fonseca PJ, Puig CF, Fenoy F, Candelera RO, Beato C, et al. Predicting serious complications in patients with cancer and pulmonary embolism using decision tree modeling: the EPIPHANY index. Br J Cancer. 2017;116(8):994–1001.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Fonseca PJ, Carmona-Bayonas A, García IM, Marcos R, Castañón E, Antonio M, et al. A nomogram for predicting complications in patients with solid tumours and seemingly stable febrile neutropenia. Br J Cancer. 2016;114:1191–8.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. van Es N, Di Nisio M, Cesarman G, Kleinjan A, Otten H-M, Mahé I, et al. Comparison of risk prediction scores for venous thromboembolism in cancer patients: a prospective cohort study. Haematologica. 2017;102(9):1494–501.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Vickers AJ, Cronin AM. Everything you always wanted to know about evaluating prediction models (but were too afraid to ask). Urology. 2010;76(6):1298.

    Article  PubMed  Google Scholar 

  20. Khorana AA, Kuderer NM, Culakova E, Lyman GH, Francis CW. Development and validation of a predictive model for chemotherapy-associated thrombosis. Blood. 2008;111(10):4902–7.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Chaudhury A, Balakrishnan A, Thai C, Holmstrom B, Nanjappa S, Ma Z, et al. Validation of the khorana score in a large cohort of cancer patients with venous thromboembolism. Blood. 2016;128(2):879.

    Google Scholar 

  22. Del Priore G, Zandieh P, Lee M-J. Treatment of continuous data as categoric variables in obstetrics and gynecology. Obstet Gynecol. 1997;89(3):351–4.

    Article  PubMed  Google Scholar 

  23. MacCallum RC, Zhang S, Preacher KJ, Rucker DD. On the practice of dichotomization of quantitative variables. Psychol Methods. 2002;7(1):19.

    Article  PubMed  Google Scholar 

  24. Ravichandran C, Fitzmaurice GM. To dichotomize or not to dichotomize? Nutrition. 2008;24(6):610–1.

    Article  PubMed  Google Scholar 

  25. Austin PC, Brunner LJ. Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses. Stat Med. 2004;23(7):1159–78.

    Article  PubMed  Google Scholar 

  26. Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. JNCI J Natl Cancer Inst. 1994;86(11):829–35.

    Article  PubMed  CAS  Google Scholar 

  27. DeCoster J. Iselin A-MR, Gallucci M. A conceptual and empirical examination of justifications for dichotomization. Psychol Methods. 2009;14(4):349–66.

    Article  PubMed  Google Scholar 

  28. Jiménez-Fonseca P, Carmona-Bayonas A, Hernández R, Custodio A, Cano JM, Lacalle A, et al. Lauren subtypes of advanced gastric cancer influence survival and response to chemotherapy: real-World Data from the AGAMENON National Cancer Registry. Br J Cancer. 2017;117(6):775–82.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  29. George BJ, Beasley TM, Brown AW, Dawson J, Dimova R, Divers J, et al. Common scientific and statistical errors in obesity research. Obesity. 2016;24(4):781–90.

    Article  PubMed  Google Scholar 

  30. Morton V, Torgerson DJ. Effect of regression to the mean on decision making in health care. BMJ Br Med J. 2003;326(7398):1083.

    Article  Google Scholar 

  31. Tsuboi M, Ezaki K, Tobinai K, Ohashi Y, Saijo N. Weekly administration of epoetin beta for chemotherapy-induced anemia in cancer patients: results of a multicenter, Phase III, randomized, double-blind, placebo-controlled study. Jpn J Clin Oncol. 2009;39(3):163–8.

    Article  PubMed  Google Scholar 

  32. Bland JM, Altman DG. Statistics notes: some examples of regression towards the mean. BMJ. 1994;309(6957):780.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. Aronson JK. Biomarkers and surrogate endpoints. Br J Clin Pharmacol. 2005;59(5):491–4.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Hamilton RJ, Goldberg KC, Platz EA, Freedland SJ. The influence of statin medications on prostate-specific antigen levels. JNCI J Natl Cancer Inst. 2008;100(21):1511–8.

    Article  PubMed  CAS  Google Scholar 

  35. Miyamoto RK, Thompson IM. The reliability of digital rectal exam, PSA, repeat prostate biopsy, and endorectal MRI for following patients with clinically localized prostate cancer on active surveillance. J Urol. 2008;179(4):154.

    Article  Google Scholar 

  36. Cummings SR, Palermo L, Browner W, Marcus R, Wallace R, Pearson J, et al. Monitoring osteoporosis therapy with bone densitometry: misleading changes and regression to the mean. JAMA. 2000;283(10):1318–21.

    Article  PubMed  CAS  Google Scholar 

  37. Vitolins MZ, Griffin L, Tomlinson WV, Vuky J, Adams PT, Moose D, et al. Randomized trial to assess the impact of venlafaxine and soy protein on hot flashes and quality of life in men with prostate cancer. J Clin Oncol. 2013;31(32):4092–8.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Wainer H. The most dangerous equation. Am Sci. 2007;95(3):249.

    Article  Google Scholar 

  39. Koller MT, Raatz H, Steyerberg EW, Wolbers M. Competing risks and the clinical community: irrelevance or ignorance? Stat Med. 2012;31(11–12):1089–97.

    Article  PubMed  Google Scholar 

  40. Berry SD, Ngo L, Samelson EJ, Kiel DP. Competing risk of death: an important consideration in studies of older adults. J Am Geriatr Soc. 2010;58(4):783–7.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Pietersen E, Ignatius E, Streicher EM, Mastrapa B, Padanilam X, Pooran A, et al. Long-term outcomes of patients with extensively drug-resistant tuberculosis in South Africa: a cohort study. Lancet. 2014;383(9924):1230–9.

    Article  PubMed  Google Scholar 

  42. Ay C, Dunkler D, Simanek R, Thaler J, Koder S, Marosi C, et al. Prediction of venous thromboembolism in patients with cancer by measuring thrombin generation: results from the Vienna Cancer and Thrombosis Study. J Clin Oncol. 2011;29(15):2099–103.

    Article  PubMed  Google Scholar 

  43. Ay C, Vormittag R, Dunkler D, Simanek R, Chiriac A-L, Drach J, et al. D-dimer and prothrombin fragment 1 + 2 predict venous thromboembolism in patients with cancer: results from the Vienna Cancer and Thrombosis Study. J Clin Oncol. 2009;27(25):4124–9.

    Article  PubMed  CAS  Google Scholar 

  44. Campigotto F, Neuberg D, Zwicker JI. Biased estimation of thrombosis rates in cancer studies using the method of Kaplan and Meier. J Thromb Haemost. 2012;10(7):1449–51.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Brown JD, Adams VR, Moga DC. Impact of time-varying treatment exposures on the risk of venous thromboembolism in multiple myeloma. Healthcare. 2016;4(4):93.

    Article  PubMed Central  Google Scholar 

  46. Austin PC, Lee DS, Fine JP. Introduction to the analysis of survival data in the presence of competing risks. Circulation. 2016;133(6):601–9.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Thompson CA, Zhang Z-F, Arah OA. Competing risk bias to explain the inverse relationship between smoking and malignant melanoma. Eur J Epidemiol. 2013;28(7):557–67.

    Article  PubMed  Google Scholar 

  48. Stragliotto G, Rahbar A, Solberg NW, Lilja A, Taher C, Orrego A, et al. Effects of valganciclovir as an addon therapy in patients with cytomegaloviruspositive glioblastoma: a randomized, double blind, hypothesis generating study. Int J Cancer. 2013;133(5):1204–13.

    Article  PubMed  CAS  Google Scholar 

  49. Park HS, Gross CP, Makarov DV, James BY. Immortal time bias: a frequently unrecognized threat to validity in the evaluation of postoperative radiotherapy. Int J Radiat Oncol Biol Phys. 2012;83(5):1365–73.

    Article  PubMed  Google Scholar 

  50. Parikh ND, Marshall VD, Singal AG, Nathan H, Lok AS, Balkrishnan R, et al. Survival and cost-effectiveness of sorafenib therapy in advanced hepatocellular carcinoma: an analysis of the SEER-Medicare database. Hepatology. 2017;65(1):122–33.

    Article  PubMed  Google Scholar 

  51. Suissa S. Immortal time bias in pharmacoepidemiology. Am J Epidemiol. 2007;167(4):492–9.

    Article  PubMed  Google Scholar 

  52. Redelmeier DA, Singh SM. Survival in Academy Award–winning actors and actresses. Ann Intern Med. 2001;134(10):955–62.

    Article  PubMed  CAS  Google Scholar 

  53. Bonadonna G, Valagussa P. Dose-response effect of adjuvant chemotherapy in breast cancer. N Engl J Med. 1981;304(1):10–5.

    Article  PubMed  CAS  Google Scholar 

  54. Simon R, Makuch RW. A non-parametric graphical representation of the relationship between survival and the occurrence of an event: application to responder versus non-responder bias. Stat Med. 1984;3(1):35–44.

    Article  PubMed  CAS  Google Scholar 

  55. van Rein N, Cannegieter SC, Rosendaal FR, Reitsma PH, Lijfering WM. Suspected survivor bias in case-control studies: stratify on survival time and use a negative control. J Clin Epidemiol. 2017;67(2):232–5.

    Article  Google Scholar 

  56. Hu Z-H, Connett JE, Yuan J-M, Anderson KE. Role of survivor bias in pancreatic cancer case–control studies. Ann Epidemiol. 2016;26(1):50–6.

    Article  PubMed  Google Scholar 

  57. Sy RW, Bannon PG, Bayfield MS, Brown C, Kritharides L. Survivor treatment selection bias and outcomes research: a case study of surgery in infective endocarditis. Circ Cardiovasc Qual Outcomes. 2009;2(5):469–74.

    Article  PubMed  Google Scholar 

  58. Ho AM-H, Zamora JE, Holcomb JB, Ng CSH, Karmakar MK, Dion PW. The many faces of survivor bias in observational studies on trauma resuscitation requiring massive transfusion. Ann Emerg Med. 2017;66(1):45–8.

    Article  Google Scholar 

  59. Brundage M, Osoba D, Bezjak A, Tu D, Palmer M, Pater J. Lessons learned in the assessment of health-related quality of life: selected examples from the National Cancer Institute of Canada Clinical Trials Group. J Clin Oncol. 2007;25(32):5078–81.

    Article  PubMed  Google Scholar 

  60. Nielsen SF, Nordestgaard BG, Bojesen SE. Statin use and reduced cancer-related mortality. N Engl J Med. 2012;367(19):1792–802.

    Article  PubMed  CAS  Google Scholar 

  61. Griffiths R, Mikhael J, Gleeson M, Danese M, Dreyling M. Addition of rituximab to chemotherapy alone as first-line therapy improves overall survival in elderly patients with mantle cell lymphoma. Blood. 2011;118(18):4808–16.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  62. Austin PC, Mamdani MM, Van Walraven C, Tu JV. Quantifying the impact of survivor treatment bias in observational studies. J Eval Clin Pract. 2006;12(6):601–12.

    Article  PubMed  Google Scholar 

  63. Jeličić H, Phelps E, Lerner RM. Use of missing data methods in longitudinal studies: the persistence of bad practices in developmental psychology. Dev Psychol. 2009;45(4):1195–9.

    Article  PubMed  Google Scholar 

  64. Burton A, Altman DG. Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. Br J Cancer. 2004;91(1):4–8.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  65. Rombach I, Rivero-Arias O, Gray AM, Jenkinson C, Burke O. The current practice of handling and reporting missing outcome data in eight widely used PROMs in RCT publications: a review of the current literature. Qual Life Res. 2016;25(7):1613–23.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Raboud JM, Montaner JSG, Thorne A, Singer J, Schechter MT. Group CHIVTNAS. Impact of missing data due to dropouts on estimates of the treatment effect in a randomized trial of antiretroviral therapy for HIV-infected individuals. JAIDS J Acquir Immune Defic Syndr. 1996;12(1):46–55.

    Article  CAS  Google Scholar 

  67. Rubin DB, Schenker N. Multiple imputation in healthcare databases: an overview and some applications. Stat Med. 1991;10(4):585–98.

    Article  PubMed  CAS  Google Scholar 

  68. Harrell F. Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. 2nd ed. New York: Springer; 2015.

    Book  Google Scholar 

  69. Vasan SK, Hwang J, Rostgaard K, Nyrén O, Ullum H, Pedersen OB, et al. ABO blood group and risk of cancer: a register-based cohort study of 1.6 million blood donors. Cancer Epidemiol. 2016;44:40–3.

    Article  PubMed  Google Scholar 

  70. Sen PK. Multiple comparisons in interim analysis. J Stat Plan Inference. 1999;82(1):5–23.

    Article  Google Scholar 

  71. Smith GD, Ebrahim S. Data dredging, bias, or confounding: they can all get you into the BMJ and the Friday papers. BMJ Br Med J. 2002;325(7378):1437.

    Article  Google Scholar 

  72. Sterling TD. Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. J Am Stat Assoc. 1959;54(285):30–4.

    Google Scholar 

  73. Stacey AW, Pouly S, Czyz CN. An analysis of the use of multiple comparison corrections in ophthalmology research. An Analysis of the use of multiple comparison corrections. Invest Ophthalmol Vis Sci. 2012;53(4):1830–4.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

Priscilla Chase Duran is acknowledged for editing the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Carmona-Bayonas.

Ethics declarations

Conflict of interest

None to declare. This is an academic study. No financial support has been received from external sources.

Ethical statement

The study has been performed in accordance with the ethical standards of the Declaration of Helsinki and its later amendments.

Informed consent

Not required.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Carmona-Bayonas, A., Jimenez-Fonseca, P., Fernández-Somoano, A. et al. Top ten errors of statistical analysis in observational studies for cancer research. Clin Transl Oncol 20, 954–965 (2018). https://doi.org/10.1007/s12094-017-1817-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12094-017-1817-9

Keywords

Navigation