Keywords

1 Introduction

Scholarly publishers have a duty to maintain the integrity of the published scholarly record. Science is often described as self-correcting, and when errors are identified in the published record, it is the responsibility of publishers to correct them. This is carried out by publishing corrections, expressions of concern or, sometimes, retracting published articles. Errors in published research can be honest, such as typographical errors in data tables or broken links to source material, but errors also result from research misconduct, including fraudulent or unethical research, and plagiarism. Only a small fraction – less than 0.1% (Grieneisen and Zhang 2012) – of published research is retracted, and papers are more likely to be retracted due to misconduct, than honest error (Fang et al. 2012).

However, the numbers of reported corrections and retractions do not account for the more pressing issue: that a large proportion of published – assumed accurate – research results are not reproducible, when reproducibility and replicability are tenets of science. Pharmaceutical companies have reported that fewer than 25% of the results reported in peer-reviewed publications could be reproduced in their labs (Prinz et al. 2011). A survey of 1,500 researchers found that more than half of respondents could not reproduce their own results and more than 70% could not reproduce the results of others (Baker 2016). An economic analysis in 2015 estimated that irreproducible preclinical research costs US $28 billion per year (Freedman et al. 2015).

There are numerous causes of irreproducibility and suboptimal data quality (Table 1). Some of these causes relate to how research is conducted and supervised, and others relate to how well or completely research is reported. Data quality and reproducibility cannot be assessed without complete, transparent reporting of research and the availability of research outputs which can be reused. Scholarly publishers have a responsibility to promote reproducible research (Hrynaszkiewicz et al. 2014) but are more able to influence the reporting of research than the conduct of research. Transparency is a precursor to reproducibility and can be supported by journals and publishers (Nature 2018).

Table 1 Causes of poor reproducibility and poor data quality in preclinical research

Implementation of greater transparency in, reporting and reuse potential of, research by publishers can be achieved in several ways:

  1. 1.

    Understanding researchers’ problems and motivations

  2. 2.

    Raising awareness and changing behaviours

  3. 3.

    Improving the quality, transparency and objectivity of the peer-review process

  4. 4.

    Better scholarly communication infrastructure and innovation

  5. 5.

    Enhancing incentives

  6. 6.

    Making research publishing more open and accessible

This chapter describes practical approaches being taken by publishers, in these six areas, to achieve greater transparency and discusses their progress and effectiveness and the implications for researchers.

2 Understanding Researchers’ Problems and Motivations

Publishers wishing to increase transparency and reproducibility need to understand the problems (or “challenges”) researchers have in practising reproducible and transparent research. Sharing of research data is essential for reproducible research, and between 2016 and 2018, several large surveys of researchers were conducted by publishing and publishing technology companies, providing insights into researchers’ reported data sharing practices and behaviours, as well as insight into what motivates researchers to share, or not share, research data.

A survey conducted in 2017, and published in 2018, by the publisher Springer Nature explored the “practical challenges” researchers have in data sharing, which received 7,719 responses, one of the largest of its kind. Seventy-six percent of respondents reported that the discoverability of their research data is important to them, and 63% had previously shared data associated with a peer-reviewed article. However, researchers also reported common problems in sharing their data, including difficulties in “organising data in a presentable and useful way” (46% of respondents), being unsure about licencing and copyright of data (37%) and not knowing which data repository to use (33%). A lack of time (26%) and being unable to cover costs of data sharing (19%) were also commonly cited (Stuart et al. 2018) (Fig. 1).

Fig. 1
figure 1

Organising data in a presentable and useful way was the most common problem for researchers in data sharing, in a large survey (n = 7,719). Figure adapted from Stuart et al. (2018)

Disciplinary differences were also identified in the survey. Biological science researchers reported the highest levels of data sharing (75%), and medical science researchers reported that copyright and licencing (data ownership) issues were their biggest challenge. Medical science researchers were also most likely to report concerns about data sensitivity and misuse, and concerns about protecting research participants, consistent with other surveys (Rathi et al. 2012) of clinical researchers.

Surveys from the publishers Wiley (Wiley Open Science Researcher Survey 2016) and Elsevier (Berghmans et al. 2017) and the publishing technology company Digital Science (Science et al. 2017) have found similar results regarding the proportion of researchers who report they share data and the ways in which researchers share data (Table 2). The most common ways of sharing data that were reported tend to be suboptimal, with email being the most common method for private data sharing (Allin 2018) and journal supplementary materials being most common for public data sharing (email is not secure enough for private data sharing; data repositories are preferred over supplementary materials for public data sharing) (Michener 2015).

Table 2 Seventy percent of researchers report that they share data but only 26% use data repositories when the results of five large surveys are combined

2.1 Understanding Motivations to Share Data

Sharing research data has been associated with an increase in the number of citations that researchers’ papers receive (Piwowar et al. 2007; Piwowar and Vision 2013; Colavizza et al. 2019) and an increase in the number of papers that research projects produce (Pienta and Alter 2010). Some researchers report that increased academic credit (Science et al. 2017), and increased visibility of their research (Wiley Open Science Researcher Survey 2016; Schmidt et al. 2016), motivates them to share research data. Publishers and other service providers to researchers can help to both solve problems and increase motivations, in particular those relating to academic credit, impact and visibility (see Sect. 6).

3 Raising Awareness and Changing Behaviours

Scholarly publishers and journals can help to raise awareness of issues through their wide or community-focused readership – with editorials, opinion pieces and conference and news coverage. Behavioural change can be created by changing journal and publisher policies, as researchers are motivated to comply with them when submitting papers (Schmidt et al. 2016).

3.1 Journal Policies

Journal policies and guides to authors include large amounts of information covering topics from manuscript formatting, research ethics and conflicts of interest. Many journals and publishers have, since 2015, endorsed – and are beginning to implement – the Transparency and Openness Promotion (TOP) guidelines. The TOP guidelines are a comprehensive but aspirational set of journal policies and include eight modular standards, each with three levels of increasing stringency including transparency in data, code and protocols (Nosek et al. 2014). A summary table of the requirements is available in the public domain from the Center for Open Science (Table 3). Full compliance with the TOP guidelines is typically a long-term goal for journals and publishers, and implementation of the requirements is tending to happen in progressive steps, with most progress being made initially in policies for sharing of research data.

Table 3 TOP guidelines summary tablea

3.1.1 Standardising and Harmonising Journal Research Data Policies

While availability of research data alone does not enable reproducible research, unavailability of data (Ioannidis et al. 2009) and suboptimal data curation (Hardwicke et al. 2018) have been shown to lead to failures to reproduce results. Historically, relatively few journals have had research data policies, and, where policies have existed, they have lacked standards and consistency, which can be confusing for researchers (authors) and research support staff (Naughton and Kernohan 2016; Barbui 2016). In 2016 Springer Nature, which publishes more than 2,500 journals, begun introducing standard, harmonised research data policies to its journals (Hrynaszkiewicz et al. 2017a). Similar initiatives were introduced by some of the other largest journal publishers Elsevier, Wiley and Taylor and Francis in 2017, greatly increasing the prevalence of journal data sharing policies. These large publishers have offered journals a controlled number (usually four or five) of data policy types, including a basic policy with fewer requirements compared to the more stringent policies (Table 4).

Table 4 Summary of Springer Nature journal data policy types and examples of journals with those policy types

Providing several options for journal data policy is necessary because, across multiple research disciplines, some research communities and their journals are more able to introduce strong data sharing requirements than others. In parallel to these individual publisher’s data policy initiatives, a global collaboration of publishers, and other stakeholders in research, have created a master research data policy framework that supports all journal and publisher requirements (Hrynaszkiewicz et al. 2017b, 2019).

There have also been research data policy initiatives from communities of journals and journal editors. In 2010 journals in ecology and evolutionary biology joined in supporting a Joint Data Archiving Policy (JDAP) (Whitlock et al. 2010), Public Library of Science (PLOS) introduced a strong data sharing policy to all its journals in 2014, and in 2017 the International Committee of Medical Journal Editors (ICMJE) introduced a standardised data sharing policy (Taichman et al. 2017) for its member journals, which include BMJ, Lancet, JAMA and the New England Journal of Medicine. The main requirement of the ICMJE policy was not to mandate data sharing but for reports of clinical trials to include a data sharing statement.

Data sharing statements (also known as data availability statements) are a common feature of journal and publisher data policies. They provide a statement about where data supporting the results reported in a published article can be found – including, where applicable, hyperlinks to publicly archived datasets analysed or generated during the study. Many journals and publishers provide guidance on preparing data availability statements (e.g. https://www.springernature.com/gp/authors/research-data-policy/data-availability-statements/12330880). All Public Library of Science (PLOS), Nature and BMC journals require data availability statements (Colavizza et al. 2019). Some research funding agencies – including the seven UK research councils (UK Research and Innovation 2011) – also require the provision of data availability statements in published articles.

Experimental pharmacology researchers publishing their work in 2019 and beyond, regardless of their target journal(s), should be prepared at minimum to provide a statement regarding the availability and accessibility of the research data that support the results of their papers.

3.1.1.1 Code and Materials Sharing Policies

To assess data quality and enable reproducibility, transparency and sharing of computer code and software (and supporting documentation) are also important – as is, where applicable, the sharing of research materials. Materials include samples, cell lines and antibodies. Journal and publisher policies on sharing code, software and materials are becoming more common but are generally less well evolved and less widely established compared to research data policies.

In 2015 the Nature journals introduced a policy across all its research titles that encourages all authors to share their code and provide a “code availability” statement in their papers (Nature 2015). Nature Neuroscience has taken this policy further, by piloting peer review of code associated with research articles in the journal (Nature 2017). Software-focused journals such as the Journal of Open Research Software and Source Code for Biology and Medicine tend to have the most stringent requirements for availability and usability of code.

3.2 Effectiveness of Journal Research Data Policies

Journal submission guidelines can increase transparent research practices by authors (Giofrè et al. 2017; Nuijten et al. 2017). Higher journal impact factors have been associated with stronger data sharing policies (Vasilevsky et al. 2017). Stronger data policies that mandate and verify data sharing by authors, and require data availability statements, are more effective at ensuring data are available long term (Vasilevsky et al. 2017) compared to policies that passively encourage data sharing (Vines et al. 2013). Many journal policies ask authors to make supporting data available “on reasonable request”, as a minimum requirement. This approach to data sharing may be a necessity in medical research, to protect participant privacy, but contacting authors of papers to obtain copies of datasets is an unreliable method of sharing data (Vanpaemel et al. 2015; Wicherts et al. 2006; Savage and Vickers 2009; Rowhani-Farid and Barnett 2016). Using more formal, data sharing (data use) agreements can improve authors’ willingness to share data on request (Polanin and Terzian 2018), and guidelines on depositing clinical data in controlled-access repositories have been defined by editors and publishers, as a practical alternative to public data sharing (Hrynaszkiewicz et al. 2016). Publishers are also supporting editors to improve policy effectiveness and consistency of implementation (Graf 2018).

4 Improving the Quality, Transparency and Objectivity of the Peer-Review Process

The reporting of research methods, interventions, statistics and data on harms of drugs, in healthcare research, and the presentation of results in journal articles, has repeatedly been found to be inadequate (Simera et al. 2010). Increasing the consistency and detail of reporting key information in research papers, with reporting guidelines and checklists, supports more objective assessment of papers in the peer-review process.

4.1 Implementation of Reporting Guidelines

The prevalence and endorsement of reporting guidelines, catalogued by the EQUATOR Network (http://www.equator-network.org), in journals has increased substantially in the last decade. Reporting guidelines usually comprise a checklist of key information that should be included in manuscripts, to enable the research to be understood and the quality of the research to be assessed. Reporting guidelines are available for a wide array of study designs, such as randomised trials (the CONSORT guideline), systematic reviews (the PRISMA guidelines) and animal preclinical studies (the ARRIVE guidelines; discussed in detail in another chapter in this volume).

The positive impact of endorsement of reporting guidelines by journals has however been limited (Percie du Sert et al. 2018), in part due to reporting guidelines often being implemented by passive endorsement on journal websites (including them in information for authors). Some journals, such as the medical journals BMJ and PLOS Medicine, have mandated the provision of certain completed reporting guidelines, such as CONSORT, as a condition of submitting manuscripts. Endorsement and implementation of reporting guidelines has been more prevalent in journals with higher impact factors (Shamseer et al. 2016). More active interventions to enforce policy in the editorial process are generally more effective, as demonstrated with data sharing policies (Vines et al. 2013), but these interventions are also more costly as they increase demands on authors’ and editors’ time. For larger, multidisciplinary journals publishing many types of research, identifying and enforcing the growing number of relevant reporting guidelines, which can vary from paper to paper that is submitted, is inherently more complex and time-consuming. These processes of checking manuscripts for adherence to guidelines can however be supported with artificial intelligence tools such as https://www.penelope.ai/.

An alternative approach to this problem taken by the multidisciplinary science journal Nature was to introduce a standardised editorial checklist to promote transparent reporting that could be applied to many different study designs and research disciplines. The checklist was developed by the journal in collaboration with researchers and funding agencies (Anon 2013) and is implemented by professional editors, who require that all authors complete it. The checklist elements focus on experimental and analytical design elements that are crucial for the interpretation of research results. This includes description of methodological parameters that can introduce bias or influence robustness and characterisation of reagents that may be subject to biological variability, such as cell lines and antibodies. The checklist has led to improved reporting of risks of bias in in vivo research and improved reporting of randomisation, blinding, exclusions and sample size calculations; however in vitro data compliance was not improved, in an independent assessment of the checklist’s effectiveness (Macleod and The NPQIP Collaborative Group 2017; The NPQIP Collaborative Group 2019).

In 2017 the Nature checklist evolved into two documents, a “reporting summary” that focuses on experimental design, reagents and analysis and an “editorial policy checklist” that covers issues such as data and code availability and research ethics. The reporting summary document is published alongside the associated paper and, to enable reuse by other journals and institutions, is made available under an open-access licence (Announcement 2017) (Fig. 2).

Fig. 2
figure 2

The Nature Research Reporting Summary Checklist, available under a Creative Commons attribution licence from https://www.nature.com/authors/policies/ReportingSummary.pdf

4.2 Editorial and Peer-Review Procedures to Support Transparency and Reproducibility

Peer review is important for assessing and improving the quality of published research (even if evidence of its effectiveness is often questioned (Smith 2010)). Most journals have been, understandably, reluctant to give additional mandatory tasks to peer reviewers – who may already be overburdened with the continuing increase in volume of publications (Kovanis et al. 2016) – to ensure journal policy compliance and assessment of research data and code. Journal policies often encourage reviewers to consider authors’ compliance with data sharing policies, but formal peer review of data tends to occur only in a small number of specialist journals, such as data journals (see later in this chapter) and journals with the strictest data sharing policies. The most stringent research data policy of Springer Nature’s four types of policy requires peer reviewers to access the supporting data for every publication in a journal and includes guidelines for peer reviewers of data:

Peer reviewers should consider a manuscript’s Data availability statement (DAS), where applicable. They should consider if the authors have complied with the journal’s policy on the availability of research data, and whether reasonable effort has been made to make the data that support the findings of the study available for replication or reuse by other researchers.

For the Data availability statement, reviewers should consider:

  • Has an appropriate DAS been provided?

  • Is it clear how a reader can access the data?

  • Where links are provided in the DAS, are they working/valid?

  • Where data access is restricted, are the access controls warranted and appropriate?

  • Where data are described as being included with the manuscript and/or supplementary information files, is this accurate?

For the data files, where available, reviewers should consider:

  • Are the data in the most appropriate repository?

  • Were the data produced in a rigorous and methodologically sound manner?

  • Are data and any metadata consistent with file format and reporting standards of the research community?

  • Are the data files deposited by the authors complete and do they match the descriptions in the manuscript?

  • Do they contain personally identifiable, sensitive or inappropriate information?

However, as of 2019 fewer than ten journals have implemented this policy of formal data peer review as a mandatory requirement, and journal policies on data sharing and reproducibility tend to focus on transparent reporting, such as including links to data sources. This enables a motivated peer reviewer to assess aspects of a study, such as data and code, more deeply, but this is not routinely expected.

In specific disciplines, journals and study designs, additional editorial assessment and statistical review are routinely employed. Some medical journals, such as The Lancet, consistently invite statistical review of clinical trials, and statistical reviewers have been found to increase the quality of reporting of biomedical articles (Cobo et al. 2007). Not all journals, such as those without full-time editorial staff, have sufficient resources to statistically review all research papers. Instead, journals may rely on editors and nonstatistical peer reviewers identifying if statistical review is warranted and inviting statistical review case-by-case.

Some journals have taken procedures on assessing reproducibility and transparency even further. The journal Biostatistics employs an Associate Editor for reproducibility, who awards articles “kite marks” for reproducibility, which are determined by the availability of code and data and if the Associate Editor for reproducibility is able to reproduce the results in the paper (Peng 2009). Another journal, npj Breast Cancer, has involved an additional editor, a Research Data Editor (a professional data curator), to assess every accepted article and give authors editorial support to describe and share link to the datasets that support their articles (Kirk and Norton 2019).

4.3 Image Manipulation and Plagiarism Detection

Plagiarism and self-plagiarism are common forms of misconduct and common reasons for papers being retracted (Fang et al. 2012). In the last decade, many publishers have adopted plagiarism detection software, and some apply this systematically to all submissions. Plagiarism detection software, such as iThenticate, works by comparing manuscripts against a database of billions of web pages and 155 million content items, including 49 million works from 800 scholarly publishers that participate in CrossRef Similarity Check (https://www.crossref.org/services/similarity-check/). Plagiarism detection is an important mechanism for publishers, editors and peer reviewers to maintain quality and integrity in the scholarly record. Although less systematically utilised in the editorial process, software for automated detection of image manipulation – a factor in about 40% of retractions in the biomedical literature and thought to affect 6% of published papers – is also available to journals (Bucci 2018).

5 Better Scholarly Communication Infrastructure and Innovation

Publishers provide and utilise scholarly communication infrastructure, which can be both an enabler and a barrier to reproducibility. In this chapter, “scholarly communication infrastructure” means journals, article types, data repositories, publication platforms and websites, content production and delivery systems and manuscript submission and peer-review systems.

5.1 Tackling Publication (Reporting) Bias

Publication bias, also known as reporting bias, is the phenomenon in which only some of the results of research are published and therefore made available to inform evidence-based decision-making. Papers that report “positive results”, such as positive effects of drugs on a condition or disease, are more likely to be published, are more likely to be published quickly and are likely to be viewed more favourably by peer reviewers (McGauran et al. 2010; Emerson et al. 2010). In healthcare-related research, this is a pernicious problem, and widely used healthcare interventions, such as the antidepressant reboxetine (Eyding et al. 2010), have been found to be ineffective or potentially harmful, when unpublished results and data are obtained and combined with published results in meta-analyses.

5.1.1 Journals

Providing a sufficient range of journals is a means to tackle publication bias. Some journals have dedicated themselves exclusively to the publication of “negative” results, although have remained niche publications and many have been discontinued (Teixeira da Silva 2015). But there are many journals that encourage publication of all methodologically sound research, regardless of the outcome. The BioMed Central (BMC) journals launched in 2000 with this mission to assess scientific accuracy rather than impact or importance and to promote publication of negative results and single experiments (Butler 2000). Many more “sounds science” journals – often multidisciplinary “mega journals” including PLOS One, Scientific Reports and PeerJ – have since emerged, almost entirely based on an online-only open-access publishing model (Björk 2015). There is no shortage of journals to publish scientifically sound research, yet publication bias persists. More than half of clinical trial results remain unpublished (Goldacre et al. 2018).

5.1.2 Preregistration of Research

Preregistration of studies and study protocols, in dedicated databases, before data are collected or patients recruited, is another means to reduce publication bias. Registration is well established – and mandatory – for clinical trials, using databases such as ClinicalTrials.gov and the ISRCTN register. The prospective registration of clinical trials helps ensure the data analysis plans, participant inclusion and exclusion criteria and other details of a planned study are publicly available before publication of results. Where this information is already in the public domain, it reduces the potential for outcome switching or other sources of bias to occur in the reported results of the study (Chan et al. 2017). Clinical trial registration has been common since 2005, when the ICMJE introduced a requirement for prospective registration of trials as a condition for publication in its member journals. Publishers and editors have been important in implementing this requirement to journals.

Preregistration has been adopted by other areas of research, and databases are now available for preregistration of systematic reviews (in the PROSPERO database) and for all other types of research, with the Open Science Framework (OSF) and the Registry for International Development Impact Evaluations (RIDIE).

5.1.2.1 Registered Reports

A more recent development for preregistration is a new type of research article, known as a registered report. Registered reports are a type of journal article where the research methods and analysis plans are both pre-registered and submitted to a journal for peer review before the results are known (Robertson 2017). Extraordinary results can make referees less critical of experiments, and with registered reports, studies can be given in principle acceptance decisions by journals before the results are known, avoiding unconscious biases that may occur in the traditional peer-review process. The first stage of the peer-review process used for registered reports assesses a study’s hypothesis, methods and design, and the second stage considers how well experiments followed the protocol and if the conclusions are justified by the data (Nosek and Lakens 2014). Since 2017, registered reports began to be accepted by a number of journals from multiple publishers including Springer Nature, Elsevier, PLOS and the BMJ Group.

5.1.2.2 Protocol Publication and Preprint Sharing

Predating registered reports, in clinical trials in particular, it has been common since the mid-2000s for researchers to publish their full study protocols as peer-reviewed articles in journals such as Trials (Li et al. 2016). Another form of early sharing of research results, before peer review has taken place or they are submitted to a journal, is preprint sharing. Sharing of preprints has been common in physical sciences for a quarter of century or more, but since the 2010s preprint servers for biosciences (biorxiv.org), and other disciplines, have emerged and are growing rapidly (Lin 2018). Journals and publishers are, increasingly, encouraging their use (Luther 2017).

5.2 Research Data Repositories

There are more than 2000 data repositories listed in re3data (https://www.re3data.org/), the registry of research data repositories (and more than 1,100 databases in the curated FAIRsharing resource on data standards, policies and databases https://fairsharing.org/). Publishers’ research data policies generally preference the use of third party or research community data repositories, rather than journals hosting raw data themselves. Publishers can help enable repositories to be used, more visible, and valued, in scholarly communication. This is beneficial for researchers and publishers, as connecting research papers with their underlying data has been associated with increased citations to papers (Dorch et al. 2015; Colavizza et al. 2019).

Publishers often provide lists of recommended or trusted data repositories in their data policies, to guide researchers to appropriate repositories (Callaghan et al. 2014) as well as linking to freely available repository selection tools (such as https://repositoryfinder.datacite.org/). Some publishers – such as Springer Nature via its Research Data Support helpdesk (Astell et al. 2018) – offer free advice to researchers to find appropriate repositories. The journal Scientific Data has defined criteria for trusted data repositories in creating and managing its list of data repositories and makes its recommended repository list available for reuse (Scientific Data 2019).

Where they are available, publishers generally promote the use of community data repositories – discipline-specific, data repositories and databases that are focused on a particular type or format of data such as GenBank for genetic sequence data. However, much research data – sometimes called the “long tail” of research data (Ferguson et al. 2014) – do not have common databases, and, for these data, general-purpose repositories such as figshare, Dryad, Dataverse and Zenodo are important to enable all research data to be shared permanently and persistently.

Publishers, and the content submission and publication platforms they use, can be integrated with research data repositories – in particular these general repositories – to promote sharing of research data that support publications. The Dryad repository is integrated to varying extents with a variety of common manuscript submission systems such as Editorial Manager and Scholar One. Integration with repositories makes it easier and more efficient for authors to share data supporting their papers. The journal Scientific Data enables authors to deposit data into figshare seamlessly during its submission process, resulting in more than a third of authors depositing data in figshare (data available via http://scientificdata.isa-explorer.org). Many publishers have invested in technology to automatically deposit small datasets shared as supplementary information files with journals articles into figshare to increase their accessibility and potential for reuse.

5.3 Research Data Tools and Services

Publishers are diversifying the products and services they provide to support researchers practice reproducible research (Inchcoombe 2017). The largest scholarly publisher Elsevier (RELX Group), for example, has acquired software used by researchers before they submit work to journals, such as the electronic lab notebook Hivebench. Better connecting scholarly communication infrastructure with researchers’ workflow and research tools is recognised by publishers as a way to promote transparency and reproducibility, and publishers are increasingly working more closely with research workflow tools (Hrynaszkiewicz et al. 2014).

While “organising data in a presentable and useful way” is a key barrier to data sharing (Stuart et al. 2018), data curation as a distinct profession, skill or activity has tended to be an undervalued and under-resourced in scholarly research (Leonelli 2016). Springer Nature, in 2018, launched a Research Data Support service (https://www.springernature.com/gp/authors/research-data/research-data-support) that provides data deposition and curation support for researchers who need assistance from professional editors in sharing data supporting their publications. Use of this service has been associated with increased metadata quality (Grant et al. 2019; Smith et al. 2018). Publishers, including Springer Nature and Elsevier, provide academic training courses in research data management for research institutions. Some data repositories, such as Dryad, offer metadata curation, and researchers can also often access training and support from their institutions and other third parties such as the Digital Curation Centre.

5.4 Making Research Data Easier to Find

Publishing platforms can promote reproducibility and provenance tracking by improving the connections between research papers and data and materials in repositories. Ensuring links between journal articles and datasets are present, functional and accurate is technologically simple, but can be procedurally challenging to implement when multiple databases are involved. Connecting published articles and research data in a standardised manner across multiple publishing platforms and data repositories, in a dynamic and universally adoptable manner, is highly desirable. This is the aim of a collaborative project between publishers and other scholarly infrastructure providers such as CrossRef, DataCite and OpenAIRE. This Scholarly Link Exchange (or, “Scholix”) project enables information on links between articles and data to be shared between all publishers and repositories in a unified manner (Burton et al. 2017). This approach, which publishers are important implementers of, means readers accessing articles on a publisher platform or literature database or data repository, such as Science Direct or EU PubMed Central or Dryad, will be provided with contemporaneous and dynamic information on datasets that are linked to articles in other journals or databases and vice versa.

6 Enhancing Incentives

Publications in peer-reviewed journals, and citations, are established mechanisms for assigning credit for scholarly contributions and for researchers and institutions to provide evidence for their research outputs and impact. Publishers can offer incentives to promote transparency by providing opportunities for additional articles and citations and new forms of incentive such as digital badges.

6.1 New Types of Journal and Journal Article

In the last 10 years, more journals, and types of journal article, have emerged that publish articles that describe specific parts of a research project. The print-biased format of traditional research articles does not always provide sufficient space to communicate all aspects of a research project. These new publications include journals that specialise in publishing articles that describe datasets or software (code), methods or protocols. Established journals have also introduced new article types that describe data, software, methods or protocols (Table 5).

Table 5 Examples of data, software, methods and protocol journals

Of these journals and article types, data journals and data papers are the most common. Data papers do not include a Results or Conclusion, like traditional research papers. They generally describe a publicly available dataset in sufficient detail so that another researcher can find, understand and reuse the data. Data journals generally do not publish raw data, but publish peer-reviewed papers that describe datasets (Hrynaszkiewicz and Shintani 2014). Data papers often include more detailed or technical information that may be excluded from traditional research papers, or which might only appear as supplementary files in traditional research papers. Data papers can both accompany traditional research papers and be independent articles that enable the publication of important datasets and databases that would not be considered as a traditional publication.

Papers published in data journals attract citations. While the number of articles published in data journals is steadily growing, they, however, represent a small proportion of the published literature overall (Berghmans et al. 2017).

6.2 Data and Software Citation

Research data, software and other research outputs, when published in digital repositories, can be assigned Digital Object Identifiers (DOIs), like research papers and chapters, enabling these research outputs to be individually discovered and cited and their citations measured in the same way.

Citing data and software promotes reproducibility by enabling linking and provenance tracking of research outputs. Papers can be persistently linked to the version(s) of data and code that were used or generated by the experiments they describe. Data citation can also provide more specific evidence for claims in papers, when those claims are based on published data. Citation of data and software is encouraged, and in some case required, as part of many journals’ data sharing and reproducible research policies (Hrynaszkiewicz et al. 2017a). Some funding agencies, such as the National Science Foundation in the USA, encourage researcher to list datasets and software (in addition to traditional publications) as part of their bibliographic sketches (Piwowar 2013).

From the researcher’s (author’s) perspective, citing data and software in reference lists is the same as citing journal articles and book chapters are cited. Several datasets are cited in this chapter, such as Smith et al. (Smith et al. 2018), and software can, similarly, also be cited when it is deposited in repositories that assign DOIs. Zenodo and figshare are commonly recommended for depositing code and software so that they can be cited.

To promote data citation and to enable data citations and links to be more visible to readers, publishers have implemented changes to the structure of published content (the XML underlying the digital version of journal articles) (Cousijn et al. 2017, 2018). Publishers and other scholarly infrastructure providers, such as DataCite and CrossRef (member organisations that generate DOIs for digital research outputs), are collaborating to enable data citation to be implemented and practised consistently, regardless of where researchers publish. Data citations, in article reference lists (bibliographies), have historically appeared in a small proportion of the published literature, but data citations have been increasing year-on-year (Garza and Fenner 2018). Researchers have indicated that they value the credit they receive through data citations, in some cases equally to the credit they receive from citations to their papers (Science et al. 2017).

6.3 Digital Badges for Transparency: A New Type of Incentive

The Center for Open Science offers digital badges that are displayed on published articles to highlight, or reward, papers where the data and materials are openly available and for studies that are pre-registered. Badges signal to the reader that the content has been made available and certify its accessibility in a persistent location. More than 40 journals, in 2018, offered or were experimenting with the award of badges to promote transparency (Blohowiak 2013). The use of digital badges is most prolific in psychology and human behavioural research journals, but they are also used in some microbiology, primatology and geoscience journals.

Awarding digital badges to authors has been associated with increased rates of data sharing by authors. When the journal Psychological Science (PSCI) introduced badges for articles with open data, the proportion of articles with open data increased (Fig. 3), compared to previous levels of data sharing in the journal. Data sharing also increased in the journal compared to other psychology journals (Munafò et al. 2017; Kidwell et al. 2016).

Fig. 3
figure 3

Percentage of articles reporting open data by half year by journal. Darker line indicates Psychological Science, and dotted red line indicates when badges were introduced in Psychological Science and none of the comparison journals. Figure and legend reproduced from Kidwell et al. (2016)

Digital badges being an effective incentive for data sharing has also been confirmed in a systematic review (Rowhani-Farid et al. 2017). The badges that are awarded by the journal Biostatistics’ Associate Editor for reproducibility have also been associated with increased data sharing, although, in the same study, badges did not have an impact on the sharing of code (Rowhani-Farid and Barnett 2018).

Badges are usually awarded by authors self-disclosing information or they are awarded as part of the peer-review process. Another method of awarding badges adopted by BMC Microbiology involves the data availability statements of each paper being assessed, independently, by the publisher (Springer Nature 2018).

Box 1 Practical Recommendations for Researchers to Support the Publication of Reproducible Research

Before You Carry Out Your Research

  • Check if your institution or employer, funding agency and target journals have policies on sharing and managing research data, materials and code or more broadly on reproducibility and open science.

    • Seek advice on compliance with these policies, and support including formal training, where needed.

    • Note that journal policies on data sharing are generally agnostic of whether research is industry or academically sponsored.

  • Consider how you will store and manage your data and other research outputs and plan accordingly, including whether additional or specific funding is required to cover associated costs.

    • Preparing a Data Management Plan (DMP) is recommended and is often required under funding agency and institutional policies. Free tools such as https://dmponline.dcc.ac.uk/ can assist in creating DMPs.

  • Determine if there are standards and expectations, and existing infrastructure such as data repositories, for sharing data in your discipline.

    • Use resources such as https://fairsharing.org/ to explore standards, policies, databases and reporting guidelines.

    • Establish if there are existing repositories for the type of data you generate. Where they exist use discipline-specific repositories for your data, and general repositories for other data types.

  • Familiarise yourself with tools that enable reproducibility, and version control, particularly for computational work (Markowetz 2015).

  • Where appropriate databases exist, consider preregistration of your study (for clinical trials registration in a compliant database is mandatory) as a means to reduce the potential for bias in analyses.

  • For clinical studies in particular, publish your study protocol as a peer-reviewed article, or at minimum be prepared to share it with journal editors and peer reviewers.

  • If your target journal(s) offer them, consider preparing a registered report.

When Preparing to Submit Your Research Results to a Journal

  • Register for an ORCID identifier and encourage your co-authors to do the same.

  • Publish a preprint of your paper in a repository such as bioRxiv, enabling the community to give you feedback on your work and for you to assert ownership and claim credit for you work early.

  • Prepare your data and code for deposition in a repository, and make these available to editors and peer reviewers.

    • Use repositories rather than supplementary information files for your datasets and code.

  • Consider publishing data papers, software papers or methods-focused papers to complement your traditional research papers, particularly if detailed information that enables understanding and reuse your research does not form part of your traditional papers.

  • If the results of your research are inconclusive and show no difference between comparison groups (“negative results”), publish them. Many journals consider such papers.

  • Always include clear statements in your publications about the availability of research data and code generated or utilised by your research.

    • If there are legitimate restrictions on the availability and reuse of your data, explain them in your data availability statements.

    • Wherever possible, include links to supporting datasets in your publications – this supports reproducibility and is associated with increased citations to papers.

  • Be prepared to share with editors and peer reviewers any materials supporting your papers that might be needed to verify, replicate or reproduce the results.

    • Many repositories enable data to be shared privately before publication and in a way that protects peer reviewers’ anonymity (where required).

  • Cite, in your reference lists and bibliographies, any persistent, publicly available datasets that were generated or reused by your research.

After Publication of Your Research

  • Be prepared to respond to reasonable requests from other scientists to reuse your data.

    • Non-compliance with data sharing policies of journals can lead to corrections, expressions of concern or retractions of papers.

  • Try to view the identification of honest errors in published work – yours and others – as a positive part of the self-correcting nature of science.

  • Remember working transparently and reproducibly is beneficial to your own reputation, productivity and impact as a researcher, as well as being beneficial to science and society (Markowetz 2015).

7 Making Research Publishing More Open and Accessible

The sixth and final area in which publisher can promote transparency relates to how open and accessible publishers are as organisations. This refers firstly to the content publishers distribute and secondly to the accessibility of other information and resources from publishers.

7.1 Open Access and Licencing Research for Reuse

Publishing more research open access, so that papers are freely and immediately available online, is an obvious means to increase transparency. The proportion of the scholarly literature that is published open access, each year, is increasing by 10–15%. Open access accounted for 17% of published articles in 2015 (Johnson et al. 2017), and the two largest journals in the world – Scientific Reports and PLOS One – are open-access journals.

Open-access publishing means more than access to research; it is also about promoting free reuse and redistribution of research, through permissive copyright licences (Suber 2012). Open-access journals and articles are typically published under Creative Commons attribution licences, such as CC BY, which means that the work can be copied, distributed, modified and adapted freely, by anyone, provided the original authors are attributed (the figures in this chapter are examples of this practice).

Publishing research under CC BY, or equivalent copyright licences, is important for promoting reproducibility of research because it enables published research outputs to be reused efficiently, by humans and machines. With this approach, the pace of research need not be slowed by the need to negotiate reuse rights and agreements with researchers and institutions. Meanwhile scholarly norms of acknowledging previous work (through citation) and legal requirements for attribution in copyright will ensure that researchers are credited for their contributions (Hrynaszkiewicz and Cockerill 2012).

Reuse of the research literature is essential for text and data mining research, and this kind of research can progress more efficiently with unrestricted access to and reuse of the published literature. Publishers can enable the reuse of research content published in subscription and open-access journals with text and data mining policies and agreements. Publishers typically permit academic researchers to programmatically access their publications, such as through secure content application programming interfaces (APIs), for text and data mining research (Text and Data Mining – Springer; Text and Data Mining Policy – Elsevier).

7.2 Open Publisher (Meta)Data

For other kinds of content, including research data, publishers can promote ease of access and reuse by applying and setting standards for content licences that enable reuse easily. In 2006 multiple publishers signed a joint statement agreeing not to take copyright in research data (STM, ALPSP 2006). Publishers have also promoted the use of liberal, public domain legal tools for research data and metadata. The publisher BMC introduced, in 2013, a default policy whereby any data published in their more than 250 journals would be available in the public domain, under the Creative Commons CC0 waiver (Hrynaszkiewicz et al. 2013). Publishers can also make data about their content catalogues (metadata) openly available. Springer Nature’s SciGraph, for example, is a linked open data platform for the scholarly domain and collates metadata from funders, research projects, conferences, affiliations and publications (SciGraph). Many publishers also make the bibliographies (reference lists) of all their publications, subscription and open access, available openly as “open citations” (Shotton 2013; I4OC).

Beyond published articles, journals and associated metadata, publishers can share other information openly. This includes survey findings (Table 1) and the results of projects to improve transparency reproducibility – such as around data sharing policies (Hrynaszkiewicz et al. 2017a) and research data curation (Smith et al. 2018). Resources produced and curated by publishers can also be made available to the wider community (such as Scientific Data 2019).

7.3 Open for Collaboration

Publishers can promote transparency through collaboration. The biggest policy and infrastructural challenges that enable the publication of more reproducible research can only be tackled by multiple publishers collaborating as an industry and collaboration with other organisations that support the conduct and communication of research – repositories, institutions and persistent identifier providers. Progress resulting from such collaborations has been seen in data citation (Cousijn et al. 2017), data policy standardisation (Hrynaszkiewicz et al. 2017b), reporting standards to enhance reproducibility (McNutt 2014) and provenance tracking of research outputs and researchers, through persistent identification initiatives such as ORCID (https://orcid.org/organizations/publishers/best-practices). All of which, combined, help publishers and the wider research community to make practical improvements to the communication of research that support improved data quality and reproducibility.

7.3.1 The Future of Scholarly Communication?

In some respects the future of scholarly communication is already here, with dynamic, reproducible papers (Lewis et al. 2018), workflow publication, data integration and interactive data, figures and code all possible, albeit at a relatively small scale. However, these innovations remain highly unevenly distributed, and the majority of published scholarly articles remain largely static objects, with the PDF format remaining popular with many readers. Like most scientific advances, progress in scholarly communication tends not to be made through giant leaps of progress but by slow, steady, incremental improvements. However, numerous major publishers have expressed strong support for open science and are introducing practical measures to introduce and strengthen policies on transparency of all research outputs, as a prerequisite to improving reproducibility. Researchers should expect continued growth in transparency policies of journals and be prepared for demands for more transparency in the reporting of their research (see Box 1 for practical suggestions for researchers). Increasing computerisation and machine readability of papers, with integration of data and code and enhancement of metadata increasing, will promote reproducibility and new forms of research quality assessment. This will help the research community assess individual research projects more specifically than the inappropriate journal-based measure of the impact factor. Large publishers will continue to diversify the types of content they publish and diversify their businesses, evolving into service providers for researchers and institutions and including content discovery, research metrics, research tools, training and analytics in their activities alongside publishing services. Technology and services are just part of implementing reproducible research, and cultural and behavioural change – and demonstrating value and impact of reproducible research – will continue to be incentivised with policies of all stakeholders in research. Monitoring compliance with transparency and reproducibility policies remains a challenge, but increasing standardisation of policies will enable economies of scale in monitoring compliance.