Data Ownership and Data Sharing Practices
Research-generated data are generally regarded as intellectual property and entitled to legal protection or copyright; however, ownership of those rights may be governed by institutional policies. Implicit in the idea of ownership of data is the concept of stewardship of data, including access, creation, modification, sale, storage, and the ability to assign or license any of these privileges to others. Open data and open science promote sharing research data gathered by scientists to verify scientific findings, reduce the financial burden of conducting research, and improve the reproducibility of scientific studies (Editorial 2018). It is also widely held that open data decrease the cost of research by fostering multiple projects from one data source, although there are costs associated with transforming and maintaining data in a repository where others can have access. In contrast, funding agencies that review, approve, and fund competing research grants can claim ownership of the data generated by a researcher; this is especially common in pharmaceutical or device companies that fund a researcher in an institution. The USA and other governments and philanthropic organizations that fund researchers for scientific projects usually award the grants to an institution and not an individual, thus making the institution the owner of the data and further complicating the issues. Data ownership policies and guarantees of data integrity are necessary for proper allocation of rights and privileges and for data sharing to be of any benefit. The FAIR Data Principles (Wilkinson et al. 2016) are a guide to maximizing the value of shared data by making it Findable, Accessible, Interoperable, and Reusable.
What Are the Issues?
Issues in publication ethics. Questions about published research come from peer reviewers, editors, readers, granting agencies, other authors, journalists, and regulators. An accurate response to some of those questions might require a review of the materials, code, and data underlying the publication. Data sharing is fundamental to allowing external review of a research project as well as replication of the results. Recent attempts to replicate findings from social science research indicate that only half of previously published, classic studies can be replicated based on the described materials and methods. Some scientists have called this a “crisis,” while others see more positive effects such as increasing requirements for preregistration of protocols and data sharing (Open Science Framework: https://osf.io/8cd4r/). The TOP Guidelines address standards for data citation; data, materials, and code transparency; design and analysis; preregistration of studies; and replication. At the most stringent level, journals would require verification of all standards by data reviewers prior to acceptance and publication of an article. At a minimum level, journals state their requirements, and authors are expected to comply by disclosing the appropriate information. Journals claiming peer review of data should clarify the level of peer review the data receive. When data are published as a part of supplementary material to a peer reviewed article, that does not always mean that the data have been reviewed, verified, or validated. Peer review of data might mean simply that the data are appropriately formatted for reuse by others.
Challenging issues. Data integrity is essential for meaningful sharing of data. Ownership of the data implies an ethical duty to safeguard the data from corruption by proper storage and maintenance and guard the data from improper use. Some large data repositories include financial and sensitive health information in available data sets; data sets can be purchased and used for health research or to target potential consumers with directed advertising. Accordingly, data repositories must allow and require complete de-identification of individuals in clinical data sets (Figueiredo 2017) to be considered an ethical option. Assurances of data integrity are essential whether the data are used for critical public health research or marketing of new movies. Maintaining and sharing data can be costly and complicated for authors as evidenced by the data availability policies at PLOS One (https://journals.plos.org/plosone/s/data-availability) and Springer Nature (https://www.springernature.com/gp/authors/research-data-policy/faqs/12327154) publishing groups. When the move toward open science and open data began, publishers did not have infrastructure at the journal level to support data curation, which they addressed by seeking collaborative relationships with new and existing data repositories. Costs of depositing data include organizing and curating data sets, assigning persistent identifier (PID) numbers or digital object identifiers (DOI), which might be required for the publication of the associated research paper, and adding metadata to the dataset to increase discoverability. Other requirements with potential charges include access to the deposited data by readers and other researchers. Most or all these costs are borne by authors and researchers themselves.
Issues in academia. Another benefit of data publication in addition to the sharing of the data for reuse by other scientists is recognition of all the contributions involved in the conduct of research. Describing those contributions was the aim of the CRediT initiative. The Contributor Roles Taxonomy (CRediT) is used by publishers, institutions, and funders to better “represent the range of contributions that researchers make to published output” (https://casrai.org/credit/). For universities adopting the taxonomy, policies at the institutional level would recognize and reward activities such as data curation, formal analysis of study data, development of software code to run analyses, management and provision of materials and resources such as laboratory samples, and validation of results as significant contributions to successful publications. Currently the standard for promotion and tenure in academia is authorship with little recognition for other essential roles in the production of scholarly publications. Academic libraries are becoming repositories for nanopublications (http://nanopub.org/wordpress/) and essential to the dissemination of uniquely identified and attributed research data, which further contributes to recognition of researcher roles and outputs. Authorship of a nanopublication, such as a research dataset, is another step toward academic recognition and reward for those researchers who are not actual authors of a journal article. For funders, the taxonomy provides some standardization in describing the roles referenced in grant submissions and enables better tracking of funded contributions.