“Reproducible” Research in Mathematical Sciences Requires Changes in our Peer Review Culture and Modernization of our Current Publication Approach
The nature of scientific research in mathematical and computational biology allows editors and reviewers to evaluate the findings of a scientific paper. Replication of a research study should be the minimum standard for judging its scientific claims and considering it for publication. This requires changes in the current peer review practice and a strict adoption of a replication policy similar to those adopted in experimental fields such as organic synthesis. In the future, the culture of replication can be easily adopted by publishing papers through dynamic computational notebooks combining formatted text, equations, computer algebra and computer code.
KeywordsReproducibility Repeatability Replicability Editorial policies Academic publishing
1 Scientists are Awakening to the “Reproducibility” Crisis
Repeatability, replicability and reproducibility are held as the gold standards for scientific research. As a consequence, the legitimacy of any research paper depends on the ability of other scientists to repeat, replicate or reproduce the results published in an article since the appearance of the first scientific journals—Journal des sçavans and Philosophical Transactions of the Royal Society—in 1665 (Kronick 1976; Shapin and Schaffer 1985).
Over the last 15 years, there have been increasing concerns that many research findings published in influential peer-reviewed scientific journals cannot be replicated in the biomedical sciences (Hirschhorn et al. 2002; Ioannidis 2005; Prinz et al. 2011; Begley and Ellis 2012), psychology (Bakker and Wicherts 2011; John et al. 2012; Pashler and Wagenmakers 2012; Van Bavel et al. 2016), sociology and economy (Bollen et al. 2015; Ioannidis et al. 2017) and computer science (Donoho et al. 2009). There are even expository articles arguing that the failure to “reproduce” also occurs in the mathematical sciences (Redish et al. 2018). In a recent survey by the journal Nature (Baker 2016), respondent scientists agreed that there is a reproducibility crisis. Although the survey results may not be representative of the scientific community, individual scholars confirmed their skepticism by trying—and often failing—to repeat, replicate or reproduce results.
The “reproducibility” crisis has caused several academics, politicians and the public at large to call into question the reliability, value and trustworthiness of the scientific enterprise (Alberts et al. 2013; Macleod 2014; Van Bavel 2016; Wood and Randall 2016). The good news is that multiple studies show that there is no evidence that the failure to “reproduce” is the result of scientific misconduct (see Fanelli 2018 and reference therein). Less than 2% of scientists are fabricating or falsifying data (Shapiro and Charrow 1989; Steneck 2006; Fanelli 2009). Metaresearch studies suggest that the “reproducibility” crisis results from heterogeneous factors distributed across different areas of science that include bias in hypothesis testing (Fanelli 2010; Fanelli et al. 2017), poor experimental design, statistical problems (John et al. 2012; Button et al. 2013) and lack of standardization to report and share experimental protocols, results and data (Wicherts et al. 2006; Nature Editorial 2014; Stark 2018).
2 Recommendations and Policies Adopted in Response to the “Reproducibility” Crisis
As a response to the “reproducibility” crisis, the scientific community is encouraging journals to maintain high standards to increase the likelihood of reproducibility, and journals are responding accordingly (McNutt 2014). In the biomedical sciences, new guidelines have been adopted requiring detailed experimental protocols, establishing best practices for image-based data and statistical analysis (National Institutes of Health 2017). It has been recommended to modify the reward system in science by awarding scientists who aligned with best practices for reproducibility research (Ioannidis 2014). In the USA, federal funding agencies are now requiring training in rigor and reproducibility for scientists holding research grants. The National Institutes of Health Office of Extramural Research revised grant application instructions and review criteria to enhance reproducibility of research findings by adopting a rigor and reproducibility guidance (Collins and Tabak 2014; National Institutes of Health 2018). Similar approaches are being adopted for the psychological, social and economic sciences (Munafò 2017).
Most of the above reproducibility policies are geared toward the experimental biomedical sciences. However, mathematical and computational biology has not been ignored by the funding agencies and scientific community. In 2003, the US National Institutes of Health and National Science Foundation formed the Interagency Modeling and Analysis Group. This group provides an open forum of communication between funding agencies and scientists to discuss the development of new modeling and analysis methods in the biomedical sciences.1 One of the working groups of Interagency Modeling and Analysis Group, the Model and Data Sharing Working Group, has been developing modeling standards, software and repositories of models and experimental data to facilitate model reproducibility and model sharing.2 Members of the Model and Data Sharing Working Group started to work on machine-readable description languages, such as Systems Biology Markup Language (SBML) (Hucka et al. 2003) and CellML (Lloyd et al. 2004), to encode mathematical models and facilitate their sharing and exchange. Simultaneously, model repositories, such as the CellML model repository (Lloyd et al. 2008), BioModels Database (Li et al. 2004; Chelliah et al. 2015) and JWS online (Olivier et al. 2004), were established to store and distribute these models encoded with SBML and CellML. More recently, funding agencies have been funding the development of software tools to build reproducible dynamic models that are reusable, like Virtual Cell (Loew and Schaff 2001), CompuCell3D (Swate et al. 2012) and Tellurium (Medley et al. 2018).
In addition, scientific policy forums are recommending good practices for publishing scientific computing research (Stodden et al. 2016). These include making the raw data analyzed computationally available in the paper (Peng 2011; Stodden et al. 2013), adopting good laboratory notebook practices (Schnell 2015; Wilson et al. 2017) and making computer code and scripts openly available (Yale RoundTable Participants 2010; Stodden et al. 2012).
3 Have the “Reproducibility” Policies been Effective in Mathematical and Computational Biology?
The recommendations and policies available in the literature have been partially effective. Standardization of mathematical models and computer code, and code sharing have not been straightforward because they are highly variable in many disciplines and subdisciplines (Renear et al. 2010). Despite this, the scientific communication has been improved in the mathematical and computational sciences because it is becoming a more common practice to share models and computer code with other researchers to make findings “reproducible” after publication (Stodden et al. 2013). Unfortunately, the effectiveness of model and code sharing has limited success in making scientific findings reproducible. An empirical study for articles published in Science shows that 44% of the authors respond to requests to share data and computer code after publication. From those, it was only possible to “reproduce” the computational results of 26% of the papers (Stodden et al. 2018).
Simultaneously, there are also challenges with the curation of mathematical and computational models in databases. Model repositories like BioModels have nearly twice as many non-curated models as curated models (fully “reproducible” models).3 Although the non-curated models could be “reproducible,” the open science movement and rapid growth of the model repositories are exceeding the capacity of database curators to verify the reproducibility of models. There are still numerous models that are not reproducible in databases, because they are not translatable into SBML or CellML formats. Some models are not “reproducible,” because errors were introduced into the model during the publication process, such as authors mistyping parameter values or equations in the publication, or publishers inevitably committing typesetting errors during the article production (Lloyd et al. 2008). In other instances, models are not “reproducible,” because they were not properly annotated following protocols, such as minimum information requested in the annotation of biochemical models (Novère et al. 2005).
Given the limited success of the recommended policies by funding agency, scientific groups and some publishers, is there a best-practice solution to the “reproducibility” crisis in the mathematical and computational sciences? Here I argue that most of the current discussions of the “reproducibility” crisis overlook the essential role that the editorial and peer review process has played in confirming “reproducibility” of experimental findings in scientific fields like organic syntheses for nearly a century. The current discussions also overlook that the current form of the scientific paper dating from the 1600s is outdated (Sommers 2018) and needs to change to keep up with current technologies available in the mathematical and computational sciences. However, before I proceed, it is important to settle what is meant by the conceptual framework of “reproducibility” in the research enterprise.
4 How can we Define the R-Words in Mathematical and Computational Biology?
“Reproducibility” is an unsettled concept in most of the papers discussing it. The terms repeatability, replicability and reproducibility are used interchangeably in everyday language (Goodman et al. 2016). As a result, these R-terms are unfortunately used in nonstandard conceptual frameworks across the sciences. The definitions of the R-words, repeatability, replicability and reproducibility, vary between computer scientist subgroups (Association for Computing Machinery 2016; Rougier et al. 2017). They are also different between social, behavioral and economic sciences (Bollen et al. 2015), computational neuroscience (Crook 2013; Plesser 2018) and biomedical sciences (Goodman et al. 2016).
Repeatability (Same team, same methodological setup):
For experimental laboratory research, repeatability occurs when the measurement can be obtained with stated precision by the same team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same location on multiple trials. For mathematical and computational biology research, this means that a researcher can reliably repeat their own mathematical analysis and numerical computation.
Replicability (Different team, same methodological setup):
For experimental laboratory research, replicability occurs when the measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials. For mathematical and computational biology research, this means that an independent group can obtain the same mathematical and computational results if they follow the methodology described in a published research paper, or reuse the same scripts and computer code.
Reproducibility (Different team, different methodological setup):
For experimental laboratory research, reproducibility occurs when the measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For mathematical and computational biology research, this means that an independent group can obtain the same mathematical and computational results using a methodology which they develop independently.
Based on these definitions, failure to reproduce is a broader concept, which requires integrating potential conflicting observations, results and ideas into a coherent theory (Redish et al. 2018). However, repeatability and replicability are clearly the two minimum gold standards of science. Repeatability is a necessary, but not sufficient condition, to determine whether a novel research finding is ready to be submitted for publication, while replicability is the minimum standard by which any published scientific paper should be judged. If we establish a culture of replication, the scientific community will clear erroneous results and create a scientific community with a disciplined approach toward scientific discoveries (Peng 2011).
5 How to Create a Culture of Replication in the Mathematical and Computational Biology Community?
The culture of replication comes from multiple directions, but it must start in the journals through peer review practice. Journals are starting to implement replication policies, though the fraction is small. An empirical analysis of journals considering mathematical and computational models shows that 22% of the journals had a computer code sharing policy, and 38% had a data sharing policy as of 2012 (Stodden et al. 2013).
The challenge is that these data and code sharing policies are having a limited impact in creating a replicability culture. For example, Biostatistics implemented a “reproducible” research policy (Peng 2009), where papers in the journals are kite-marked R if both data and code are made available in Supplementary Materials, and the Biostatistics Associate Editor for Reproducibility can use the data and code to replicate the results in the paper. Unfortunately, the “reproducibility” check is voluntarily, and only 4% of the paper received an “R” within the first 2 years of “reproducibility” policy implementation (Peng 2011). Since 2014, Nature journals require authors to provide a “Code Availability” statement indicating how code or algorithms could be accessed. However, peer reviewing code can be cumbersome. Nature Methods, Nature Biotechnology and Nature Machine Intelligence are now running a trial in partnership with Code Ocean to enable authors to share computer code and facilitate its peer review, while the manuscripts are under consideration for publication. Unfortunately, the trial is optional (Pastrana and Swaminathan 2018). A similar voluntary replication review process is available in ACM Transactions on Mathematical Software (Heroux 2015). The FEBS Journal, IET Systems Biology, Microbiology and Metabolomics are collaborating with JWS Online to replicate models to be published in the journals, but the model replication is limited to ordinary differential equation kinetic models of biochemical pathways (Snoep 2005).
An empirical survey of 1576 scientists suggests that 70% of scientists have tried and failed to replicate another scientist’s findings (Baker 2016). I believe that most replication problems are bona fide results. These are scientific findings obtained by the original authors of a paper. However, replication is not possible for primarily two reasons. First, mathematical and computational biologists are not keeping a proper record how they obtained their results. Secondly, they are failing to report methodological information necessary for other researchers to replicate the results. The first problem can be addressed with a laboratory notebook policy in mathematical and computational biology research groups (Schnell 2015). The second requires mandatory replicability analysis of papers submitted for publication in mathematical and computational biology journals.
The latter suggestion seems radical, but it is not unprecedented in science. Since its inception 1921, Organic Syntheses4 publishes detailed and practical procedures for the synthesis of organic compounds. A unique feature of this journal is that every reaction and experimental procedure published is replicated in the laboratory of a member of the Board of Editors of Organic Syntheses. The Board is composed from the most distinguished members of the synthetic organic chemistry community. This ensures that the replicability experiments are supervised by the leading figures in the field. Submissions to Organic Syntheses need to provide a comprehensive list of the experimental protocols to allow replication. If any aspect of the procedures is unclear or ambiguous, authors are asked to provide additional details before the replication begins. The authors are also consulted for advice and assistance if problems are encountered during the replication experiments. Any article accepted for publication has been replicated at least twice in the checking editor’s research group. In case that there is a slight difference in the replication results, the replication editor’s results are reported in the final article (Danheiser 2011). Despite all these replicability measures, 7.5% of the articles submitted for publication to Organic Syntheses between 2010 and 2016 were rejected due to the inability of editor’s laboratories to replicate the results shown in the papers (Bergman and Danheiser 2016).
Mathematical and computational biology papers can be the subject of identical replicability experiments since data, computer algebra calculations, computer code and software can be made available to the editors upon submission. Distinguished members of the editorial board should be responsible for maintaining high standards by leading replication efforts, while evaluating the adequacy of mathematical and computational methods to support the conclusions of papers. Simultaneously, expert reviewers should evaluate the significance, quality, relevance and clarity of the approach and findings of manuscripts under consideration.
The culture of replicability requires members of the editorial board to become responsible for the gold standard of science. The ultimate responsibility of editors and reviewers is not serving as gatekeepers of novelty and significance, but carefully examining and evaluating the validity of research findings and whether those findings support the conclusions presented in original research papers.
6 The Modernization of the Research Paper can Create a Culture of Replicability
Academic publishers also need to modernize the publication of the research paper to catalyze the replicability culture. Technological advances are making science more sophisticated and complex to communicate. Nowadays research papers are longer, contain more data and leave critical methods and research findings in supplementary materials, repositories and databases. Consequently, research papers are more difficult to review for publication. The publication process is to blame as the method for communicating scientific findings has not changed since the 1600s (Sommers 2018).
In the experimental methodological sciences, the publication model is evolving through Peer Reviewed Scientific Video Journal.5 Chemists are also starting to routinely record experiments with video cameras to make research more accessible and “reproducible” (Björnmalm et al. 2016).
Can we redesign the journal article in the mathematical sciences? Yes, we can. We already have the right tools in our computers, but they are not currently used to describe procedures and algorithms and share results in research papers. They are used as pedagogical tools to teach mathematical and computational modeling in the classroom. MapleSoft™ Maple Worksheets, Wolfram Mathematica Notebooks and MathWorks MATLAB® live scripts are interactive documents that combine scripts from those programs with formatted text, equations and images to visually explore and analyze problems in the classroom. These tools can be used to write, execute and test code in an interactive environment to describe and explain scientific findings dynamically. A scientific paper submitted in the form of a “computational notebook” will be easy to replicate during the peer review processes, because a referee can execute the model, while they are reading the paper. This is something that we cannot achieve with the PDF file, because it provides a static picture of research findings by “modeling a piece of paper.” Unfortunately, the proprietary nature of Maple, Mathematica and MATLAB will make them hard to adopt broadly as a publication tool.
It is time to adopt open-source web-based applications that allows us to create and share our research with live computer code, equations, text and visualizations. In the same way that TeX and LaTeX revolutionized typesetting and publishing for mathematical and computational scientists, the Jupyter Project,6 through its Jupyter Notebook and JupyterLab, can be the modern solution to typeset and publish scientific papers in mathematical and computational sciences. The Jupyter Project supports the live execution of over 100 programming languages, including computer algebra algorithms and software, which are regularly used in mathematical and computational biology. Currently, Jupyter Notebooks are used as electronic notebooks, educational modeling laboratories and supplemental material to scientific articles. It is time to extend the current uses of Jupyter Notebooks for academic publishing. Jupyter Notebooks allow us to typeset our research papers, share openly our computer codes and present dynamically our research (Sommers 2018).
This proposal is highly ambitious and may be utopic on this point. It requires changing the nature of science communication, which has relied on publishing research using paper (or an electronic paper-like form) for over 400 years. Mathematical and computational scientists and academic publishers will need to be retrained to communicate research findings using web-based applications. The investment in the technology seems to be minimal as it is already available and accessible to many. However, the modernization of the publication technology will make mathematical and computational research more open, transparent and replicable by making the scientific paper coming alive through the “computational notebooks.”
I am very grateful for the helpful insights provided by Rick Danheiser (Editor-in-Chief of Organic Syntheses and Massachusetts Institute of Technology, USA), Edmund Crampin (University of Melbourne, Australia) and Wylie Stroberg (University of Michigan). This work was partially supported through the educational programs funded by NIGMS (T32 GM008322) and NIDDK (R25 DK088752).
- Alberts B, Stodden V, Young S, Choudhury S (2013) Testimony on scientific integrity and transparency. Available online at https://science.house.gov/legislation/hearings/subcommittee-research-scientific-integrity-transparency. Accessed 10 Aug 2018
- Association for Computing Machinery (2016) Artifact review and badging. Available online at: https://www.acm.org/publications/policies/artifact-review-badging. Accessed 10 Aug 2018
- Bollen K, Cacioppo JT, Kaplan R, Krosnick J, Olds JL (2015) Social, behavioral, and economic sciences perspectives on robust and reliable science. National Science Foundation, Arlington, Virginia. Available online at: https://www.nsf.gov/sbe/SBE_Spring_2015_AC_Meeting_Presentations/Bollen_Report_on_Replicability_SubcommitteeMay_2015.pdf. Accessed 10 Aug 2018
- Chelliah V, Juty N, Ajmera I, Ali R, Dumousseau M, Glont M, Hucka M, Jalowicki G, Keating S, Knight-Schrijver V, Lloret-Villas A, Natarajan KN, Pettit JB, Rodriguez N, Schubert M, Wimalaratne SM, Zhao Y, Hermjakob H, Le Novère N, Laibe C (2015) BioModels: 10-year anniversary. Nucleic Acids Res 43:D542–D548CrossRefGoogle Scholar
- Editorial N (2014) Journals unite for reproducibility. Nature 515:7Google Scholar
- Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr JH, Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U, Le Novère N, Loew LM, Lucio D, Mendes P, Minch E, Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF, Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19:524–531CrossRefGoogle Scholar
- International Organization for Standardization (1994) Applications of statistical methods. Technical Committee ISO/TC 69, Subcommittee SC 6, Measurement methods and results. Available online at: https://www.iso.org/obp/ui/#iso:std:iso:5725:-1:ed-1:v1:en. Accessed 10 Aug 2018
- International Union of Pure and Applied Chemistry (1997) IUPAC Compendium of Chemical Terminology. In: McNaught AD, Wilkinson A (eds), 2end. Blackwell Scientific Publications, Oxford. Available online at: https://goldbook.iupac.org/. Accessed 10 Aug 2018
- Joint Committee for Guides in Metrology (2006) International Vocabulary of Metrology: Basic and General Concepts and Associated Terms, 3rd ed. Joint Committee for Guides in Metrology/Working Group 2. Available online at: https://www.nist.gov/sites/default/files/documents/pml/div688/grp40/International-Vocabulary-of-Metrology.pdf. Accessed 10 Aug 2018
- Kronick DA (1976) History of scientific and technical periodicals, 2nd edn. Scarecrow Press, MetuchenGoogle Scholar
- Le Novère N, Finney A, Hucka M, Bhalla US, Campagne F, Collado-Vides J, Crampin EJ, Halstead M, Klipp E, Mendes P, Nielsen P, Sauro H, Shapiro B, Snoep JL, Spence HD, Wanner BL (2005) Minimum information requested in the annotation of biochemical models (MIRIAM). Nat Biotechnol 23:1509–1515CrossRefGoogle Scholar
- McNutt M (2014) Journals unite for reproducibility. Science 346:6210Google Scholar
- National Institutes of Health (2017) Principles and guidelines for reporting preclinical research. Available online at: https://www.nih.gov/research-training/rigor-reproducibility/principles-guidelines-reporting-preclinical-research. Accessed 10 Aug 2018
- National Institutes of Health (2018) Rigor and reproducibility. Available online at: https://grants.nih.gov/reproducibility/index.htm. Accessed 10 Aug 2018
- Pastrana E, Swaminathan S (2018) Nature research journals trial new tools to enhance code peer review and publication of scheme and memes. A community blog from nature.com. 01 Aug 2018, 15:05 BST. Available online at: http://blogs.nature.com/ofschemesandmemes/2018/08/01/nature-research-journals-trial-new-tools-to-enhance-code-peer-review-and-publication. Accessed 10 Aug 2018
- Rougier NP, Hinsen K, Alexandre F, Arildsen T, Barba LA, Benureau FCY et al (2017) Sustainable computational science: the ReScience initiative. Available online at: https://arxiv.org/abs/1707.04393. Accessed 10 Aug 2018
- Shapin S, Schaffer S (1985) Leviathan and the air-pump: Hobbes, Boyle, and the experimental life. Princeton University Press, PrincetonGoogle Scholar
- Sommers J (2018) The scientific paper is obsolete. The Atlantic, April 5, 2018. Available online at: https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/. Accessed 10 Aug 2018
- Van Bavel (2016) Why do so many studies fail to replicate? New York Times, May 27, 2016, page SR10. Available online at: https://www.nytimes.com/2016/05/29/opinion/sunday/why-do-so-many-studies-fail-to-replicate.html. Accessed 10 Aug 2018
- Wood P, Randall D (2016) How bad is the government’s science? Wall Street Journal, April 16, 2018 5:56 ET. Available online at https://www.wsj.com/articles/how-bad-is-the-governments-science-1523915765. Accessed 10 Aug 2018
- Yale RoundTable Participants (2010) Reproducible research: addressing the need for data and code sharing in computational science. Comput Sci Eng 12:5Google Scholar