Skip to main content

On Fishing for Significance and Statistician’s Degree of Freedom in the Era of Big Molecular Data

  • Chapter
  • First Online:
Berechenbarkeit der Welt?

Zusammenfassung

There are usually plenty of conceivable approaches to statistically analyze data that both make sense from a substantive point of view and are defensible from a theoretical perspective. The data analyst has to make a lot of choices, a problem sometimes referred to as “researcher’s degree of freedom”. This leaves much room for (conscious or subconscious) fishing for significance: the researcher (data analyst) sometimes applies several analysis approaches successively and reports only the results that seem in some sense more satisfactory, for example in terms of statistical significance. This may lead to apparently interesting but false research findings that fail to get validated in independent studies. In this essay we describe and illustrate these problems and discuss possible strategies to (partially) address them such as validation, increased development of guidance documents, and publication of negative research findings, analysis plans, data and code.

Anne-Laure Boulesteix is associate professor of biostatistics with focus on computational molecular medicine at the University of Munich. She obtained her PhD in statistics in 2005 from the same university. Her research focuses on the statistical analysis of biomedical data with a special emphasis on prediction modeling and issues related to scientific practice from a statistical point of view.

Roman Hornung is a PhD student at the University of Munich, where he obtained his master’s degree in statistics in 2011. He is working on statistical methodology for the analysis of high-dimensional molecular data with focus on prediction modeling and is currently involved in leukemia research based on molecular data.

Willi Sauerbrei is professor of medical biometry at the Medical Center – University of Freiburg. His areas of expertise include multivariable regression modeling and issues related to good statistical practice such as reporting. He is the chair of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative which aims at providing guidance on the design and analysis of observational studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Altman, D. (2014). The Time Has Come to Register Diagnostic and Prognostic Research. Clinical Chemistry, 60(4), 580-582. doi:10.1373/clinchem.2013.220335.

  • Andre, F., McShane, L.M., Michiels, S., Ransohoff, D.F., Altman, D.G., Reis-Filho, J.S., Hayes, D.F., Pusztai, L. (2011). Biomarker studies: a call for a comprehensive biomarker study registry. Nat Rev Clin Oncol, 8(3), 171-176. doi: 10.1038/nrclinonc.2011.4.

  • Altman, D., Lausen, B., Sauerbrei, W., & Schumacher, M. (1994). Dangers of Using “Optimal” Cutpoints in the Evaluation of Prognostic Factors. JNCI Journal Of The National Cancer Institute, 86(11), 829-835. doi:10.1093/jnci/86.11.829.

  • Begg, C., Cho, M., Eastwood, S., Horton, R., Moher, D., Olkin, I., Pitkin, R., Rennie, D., Schulz, K.F., Simel, D., & Stroup, D.F. (1996). Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA: The Journal Of The American Medical Association, 276(8), 637-639. doi:10.1001/jama.276.8.637.

  • Boulesteix, A. (2013). On representative and illustrative comparisons with real data in bioinformatics: response to the letter to the editor by Smith et al. Bioinformatics, 29(20), 2664-2666. doi:10.1093/bioinformatics/btt458.

  • Boulesteix, A., Lauer, S., & Eugster, M. (2013). A Plea for Neutral Comparison Studies in Computational Sciences. PLOS ONE, 8(4), e61562. doi:10.1371/journal.pone.0061562.

  • Daumer, M., Held, U., Ickstadt, K., Heinz, M., Schach, S., & Ebers, G. (2008). Reducing the probability of false positive research findings by pre-publication validation – Experience with a large multiple sclerosis database. BMC Med Res Methodol, 8(1), 18. doi:10.1186/1471-2288-8-18.

  • De Angelis, C., Drazen, F.A., Haug, C., Hoey, J., Horton, R., Kotzin, S., Laine, C., Marusic, A., Overbeke, A.J., Schroeder, T.V., Sox, H.C., & Van Der Weyden, M.B. (2004). Clinical Trial Registration: A Statement from the International Committee of Medical Journal Editors. Annals Of Internal Medicine, 141(6), 477. doi:10.7326/0003-4819-141-6-200409210-00109.

  • Dougherty, E., & Bittner, M. (2011). Epistemology of the cell. Piscataway, NJ: IEEE Press.

    Google Scholar 

  • Dwan, K., Altman, D., Clarke, M., Gamble, C., Higgins, J., & Sterne, J. et al. (2014). Evidence for the Selective Reporting of Analyses and Discrepancies in Clinical Trials: A Systematic Review of Cohort Studies of Clinical Trials. Plos Med, 11(6), e1001666. doi:10.1371/journal.pmed.1001666.

  • Dwan, K., Altman, D., Cresswell, L., Blundell, M., Gamble, C., & Williamson, P. (2011). Comparison of protocols and registry entries to published reports for randomised controlled trials. Cochrane Database Of Systematic Reviews, 19(1),MR000031. doi:10.1002/14651858.mr000031.pub2.

  • Emdin, C., Odutayo, A., Hsiao, A., Shakir, M., Hopewell, S., Rahimi, K., & Altman, D. (2015). Association of Cardiovascular Trial Registration With Positive Study Findings. JAMA Internal Medicine, 175(2), 304. doi:10.1001/jamainternmed.2014.6924.

  • Ioannidis, J. (2005a). Why Most Published Research Findings Are False. Plos Med, 2(8), e124. doi:10.1371/journal.pmed.0020124.

  • Ioannidis, J. (2005b). Microarrays and molecular research: noise discovery?. The Lancet, 365(9458), 454-455. doi:10.1016/s0140-6736(05)17878-7.

  • Ioannidis, J., Greenland, S., Hlatky, M., Khoury, M., Macleod, M., & Moher, D. et al. (2014). Increasing value and reducing waste in research design, conduct, and analysis. The Lancet, 383(9912), 166-175. doi:10.1016/s0140-6736(13)62227-8.

  • Jelizarow, M., Guillemot, V., Tenenhaus, A., Strimmer, K., & Boulesteix, A. (2010). Over-optimism in bioinformatics: an illustration. Bioinformatics, 26(16), 1990-1998. doi:10.1093/bioinformatics/btq323.

  • Kasenda, B., Sauerbrei, W., Royston, P., & Briel, M. (2014). Investigation of continuous effect modifiers in a meta-analysis on higher versus lower PEEP in patients requiring mechanical ventilation - protocol of the ICEM study. Systematic Reviews, 3(1), 46. doi:10.1186/2046-4053-3-46.

  • König, I., Malley, J., Weimar, C., Diener, H., & Ziegler, A. (2007). Practical experiences on the necessity of external validation. Statist. Med., 26(30), 5499-5511. doi:10.1002/sim.3069.

  • McShane, L., Altman, D., Sauerbrei, W., Taube, S., Gion, M., & Clark, G. (2005). REporting recommendations for tumour MARKer prognostic studies (REMARK). Br J Cancer, 93(4), 387-391. doi:10.1038/sj.bjc.6602678.

  • McShane, L., Cavenagh, M., Lively, T., Eberhard, D., Bigbee, W., & Williams, P. et al. (2013). Criteria for the use of omics-based predictors in clinical trials. Nature, 502(7471), 317-320. doi:10.1038/nature12564.

  • Peat, G., Riley, R., Croft, P., Morley, K., Kyzas, P., & Moons, K. et al. (2014). Improving the Transparency of Prognosis Research: The Role of Reporting, Data Sharing, Registration, and Protocols. Plos Med, 11(7), e1001671. doi:10.1371/journal.pmed.1001671.

  • Peng, R. (2011). Reproducible Research in Computational Science. Science, 334(6060), 1226-1227. doi:10.1126/science.1213847.

  • Riley, R., Sauerbrei, W., & Altman, D. (2009). Prognostic markers in cancer: the evolution of evidence from single studies to meta-analysis, and beyond. Br J Cancer, 100(8), 1219-1229. doi:10.1038/sj.bjc.6604999.

  • Rochon, J., Gondan, M., & Kieser, M. (2012). To test or not to test: Preliminary assessment of normality when comparing two independent samples. BMC Med Res Methodol, 12(1), 81. doi:10.1186/1471-2288-12-81.

  • Sauerbrei, W., Abrahamowicz, M., Altman, D., le Cessie, S., & Carpenter, J. on behalf of the STRATOS initiative. (2014). STRengthening Analytical Thinking for Observational Studies: the STRATOS initiative. Statist. Med., 33(30), 5413-5432. doi:10.1002/sim.6265.

  • Simera, I., Moher, D., Hirst, A., Hoey, J., Schulz, K., & Altman, D. (2010). Transparent and accurate reporting increases reliability, utility, and impact of your research: reporting guidelines and the EQUATOR Network. BMC Medicine, 8(1), 24. doi:10.1186/1741-7015-8-24.

  • Simmons, J., Nelson, L., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science,22(11), 1359-1366. doi:10.1177/0956797611417632.

  • Slawski, M., Daumer, M., & Boulesteix, A. (2008). CMA – a comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics, 9(1), 439. doi:10.1186/1471-2105-9-439.

  • Xu, Y., Xu, Q., Yang, L., Ye, X., Liu, F., & Wu, F. et al. (2013). Identification and Validation of a Blood-Based 18-Gene Expression Signature in Colorectal Cancer. Clin Cancer Res, 19, 3039–3049. doi:10.1158/1078-0432.C

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anne-Laure Boulesteix .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Fachmedien Wiesbaden GmbH

About this chapter

Cite this chapter

Boulesteix, AL., Hornung, R., Sauerbrei, W. (2017). On Fishing for Significance and Statistician’s Degree of Freedom in the Era of Big Molecular Data. In: Pietsch, W., Wernecke, J., Ott, M. (eds) Berechenbarkeit der Welt?. Springer VS, Wiesbaden. https://doi.org/10.1007/978-3-658-12153-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-658-12153-2_7

  • Published:

  • Publisher Name: Springer VS, Wiesbaden

  • Print ISBN: 978-3-658-12152-5

  • Online ISBN: 978-3-658-12153-2

  • eBook Packages: Social Science and Law (German Language)

Publish with us

Policies and ethics