Zusammenfassung
There are usually plenty of conceivable approaches to statistically analyze data that both make sense from a substantive point of view and are defensible from a theoretical perspective. The data analyst has to make a lot of choices, a problem sometimes referred to as “researcher’s degree of freedom”. This leaves much room for (conscious or subconscious) fishing for significance: the researcher (data analyst) sometimes applies several analysis approaches successively and reports only the results that seem in some sense more satisfactory, for example in terms of statistical significance. This may lead to apparently interesting but false research findings that fail to get validated in independent studies. In this essay we describe and illustrate these problems and discuss possible strategies to (partially) address them such as validation, increased development of guidance documents, and publication of negative research findings, analysis plans, data and code.
Anne-Laure Boulesteix is associate professor of biostatistics with focus on computational molecular medicine at the University of Munich. She obtained her PhD in statistics in 2005 from the same university. Her research focuses on the statistical analysis of biomedical data with a special emphasis on prediction modeling and issues related to scientific practice from a statistical point of view.
Roman Hornung is a PhD student at the University of Munich, where he obtained his master’s degree in statistics in 2011. He is working on statistical methodology for the analysis of high-dimensional molecular data with focus on prediction modeling and is currently involved in leukemia research based on molecular data.
Willi Sauerbrei is professor of medical biometry at the Medical Center – University of Freiburg. His areas of expertise include multivariable regression modeling and issues related to good statistical practice such as reporting. He is the chair of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative which aims at providing guidance on the design and analysis of observational studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altman, D. (2014). The Time Has Come to Register Diagnostic and Prognostic Research. Clinical Chemistry, 60(4), 580-582. doi:10.1373/clinchem.2013.220335.
Andre, F., McShane, L.M., Michiels, S., Ransohoff, D.F., Altman, D.G., Reis-Filho, J.S., Hayes, D.F., Pusztai, L. (2011). Biomarker studies: a call for a comprehensive biomarker study registry. Nat Rev Clin Oncol, 8(3), 171-176. doi: 10.1038/nrclinonc.2011.4.
Altman, D., Lausen, B., Sauerbrei, W., & Schumacher, M. (1994). Dangers of Using “Optimal” Cutpoints in the Evaluation of Prognostic Factors. JNCI Journal Of The National Cancer Institute, 86(11), 829-835. doi:10.1093/jnci/86.11.829.
Begg, C., Cho, M., Eastwood, S., Horton, R., Moher, D., Olkin, I., Pitkin, R., Rennie, D., Schulz, K.F., Simel, D., & Stroup, D.F. (1996). Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA: The Journal Of The American Medical Association, 276(8), 637-639. doi:10.1001/jama.276.8.637.
Boulesteix, A. (2013). On representative and illustrative comparisons with real data in bioinformatics: response to the letter to the editor by Smith et al. Bioinformatics, 29(20), 2664-2666. doi:10.1093/bioinformatics/btt458.
Boulesteix, A., Lauer, S., & Eugster, M. (2013). A Plea for Neutral Comparison Studies in Computational Sciences. PLOS ONE, 8(4), e61562. doi:10.1371/journal.pone.0061562.
Daumer, M., Held, U., Ickstadt, K., Heinz, M., Schach, S., & Ebers, G. (2008). Reducing the probability of false positive research findings by pre-publication validation – Experience with a large multiple sclerosis database. BMC Med Res Methodol, 8(1), 18. doi:10.1186/1471-2288-8-18.
De Angelis, C., Drazen, F.A., Haug, C., Hoey, J., Horton, R., Kotzin, S., Laine, C., Marusic, A., Overbeke, A.J., Schroeder, T.V., Sox, H.C., & Van Der Weyden, M.B. (2004). Clinical Trial Registration: A Statement from the International Committee of Medical Journal Editors. Annals Of Internal Medicine, 141(6), 477. doi:10.7326/0003-4819-141-6-200409210-00109.
Dougherty, E., & Bittner, M. (2011). Epistemology of the cell. Piscataway, NJ: IEEE Press.
Dwan, K., Altman, D., Clarke, M., Gamble, C., Higgins, J., & Sterne, J. et al. (2014). Evidence for the Selective Reporting of Analyses and Discrepancies in Clinical Trials: A Systematic Review of Cohort Studies of Clinical Trials. Plos Med, 11(6), e1001666. doi:10.1371/journal.pmed.1001666.
Dwan, K., Altman, D., Cresswell, L., Blundell, M., Gamble, C., & Williamson, P. (2011). Comparison of protocols and registry entries to published reports for randomised controlled trials. Cochrane Database Of Systematic Reviews, 19(1),MR000031. doi:10.1002/14651858.mr000031.pub2.
Emdin, C., Odutayo, A., Hsiao, A., Shakir, M., Hopewell, S., Rahimi, K., & Altman, D. (2015). Association of Cardiovascular Trial Registration With Positive Study Findings. JAMA Internal Medicine, 175(2), 304. doi:10.1001/jamainternmed.2014.6924.
Ioannidis, J. (2005a). Why Most Published Research Findings Are False. Plos Med, 2(8), e124. doi:10.1371/journal.pmed.0020124.
Ioannidis, J. (2005b). Microarrays and molecular research: noise discovery?. The Lancet, 365(9458), 454-455. doi:10.1016/s0140-6736(05)17878-7.
Ioannidis, J., Greenland, S., Hlatky, M., Khoury, M., Macleod, M., & Moher, D. et al. (2014). Increasing value and reducing waste in research design, conduct, and analysis. The Lancet, 383(9912), 166-175. doi:10.1016/s0140-6736(13)62227-8.
Jelizarow, M., Guillemot, V., Tenenhaus, A., Strimmer, K., & Boulesteix, A. (2010). Over-optimism in bioinformatics: an illustration. Bioinformatics, 26(16), 1990-1998. doi:10.1093/bioinformatics/btq323.
Kasenda, B., Sauerbrei, W., Royston, P., & Briel, M. (2014). Investigation of continuous effect modifiers in a meta-analysis on higher versus lower PEEP in patients requiring mechanical ventilation - protocol of the ICEM study. Systematic Reviews, 3(1), 46. doi:10.1186/2046-4053-3-46.
König, I., Malley, J., Weimar, C., Diener, H., & Ziegler, A. (2007). Practical experiences on the necessity of external validation. Statist. Med., 26(30), 5499-5511. doi:10.1002/sim.3069.
McShane, L., Altman, D., Sauerbrei, W., Taube, S., Gion, M., & Clark, G. (2005). REporting recommendations for tumour MARKer prognostic studies (REMARK). Br J Cancer, 93(4), 387-391. doi:10.1038/sj.bjc.6602678.
McShane, L., Cavenagh, M., Lively, T., Eberhard, D., Bigbee, W., & Williams, P. et al. (2013). Criteria for the use of omics-based predictors in clinical trials. Nature, 502(7471), 317-320. doi:10.1038/nature12564.
Peat, G., Riley, R., Croft, P., Morley, K., Kyzas, P., & Moons, K. et al. (2014). Improving the Transparency of Prognosis Research: The Role of Reporting, Data Sharing, Registration, and Protocols. Plos Med, 11(7), e1001671. doi:10.1371/journal.pmed.1001671.
Peng, R. (2011). Reproducible Research in Computational Science. Science, 334(6060), 1226-1227. doi:10.1126/science.1213847.
Riley, R., Sauerbrei, W., & Altman, D. (2009). Prognostic markers in cancer: the evolution of evidence from single studies to meta-analysis, and beyond. Br J Cancer, 100(8), 1219-1229. doi:10.1038/sj.bjc.6604999.
Rochon, J., Gondan, M., & Kieser, M. (2012). To test or not to test: Preliminary assessment of normality when comparing two independent samples. BMC Med Res Methodol, 12(1), 81. doi:10.1186/1471-2288-12-81.
Sauerbrei, W., Abrahamowicz, M., Altman, D., le Cessie, S., & Carpenter, J. on behalf of the STRATOS initiative. (2014). STRengthening Analytical Thinking for Observational Studies: the STRATOS initiative. Statist. Med., 33(30), 5413-5432. doi:10.1002/sim.6265.
Simera, I., Moher, D., Hirst, A., Hoey, J., Schulz, K., & Altman, D. (2010). Transparent and accurate reporting increases reliability, utility, and impact of your research: reporting guidelines and the EQUATOR Network. BMC Medicine, 8(1), 24. doi:10.1186/1741-7015-8-24.
Simmons, J., Nelson, L., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science,22(11), 1359-1366. doi:10.1177/0956797611417632.
Slawski, M., Daumer, M., & Boulesteix, A. (2008). CMA – a comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics, 9(1), 439. doi:10.1186/1471-2105-9-439.
Xu, Y., Xu, Q., Yang, L., Ye, X., Liu, F., & Wu, F. et al. (2013). Identification and Validation of a Blood-Based 18-Gene Expression Signature in Colorectal Cancer. Clin Cancer Res, 19, 3039–3049. doi:10.1158/1078-0432.C
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Fachmedien Wiesbaden GmbH
About this chapter
Cite this chapter
Boulesteix, AL., Hornung, R., Sauerbrei, W. (2017). On Fishing for Significance and Statistician’s Degree of Freedom in the Era of Big Molecular Data. In: Pietsch, W., Wernecke, J., Ott, M. (eds) Berechenbarkeit der Welt?. Springer VS, Wiesbaden. https://doi.org/10.1007/978-3-658-12153-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-658-12153-2_7
Published:
Publisher Name: Springer VS, Wiesbaden
Print ISBN: 978-3-658-12152-5
Online ISBN: 978-3-658-12153-2
eBook Packages: Social Science and Law (German Language)