Data sharing

Last updated
The decision whether and how to share data often rests with researchers. To deposit or not to deposit, that is the question - journal.pbio.1001779.g001.png
The decision whether and how to share data often rests with researchers.

Data sharing is the practice of making data used for scholarly research available to other investigators. Many funding agencies, institutions, and publication venues have policies regarding data sharing because transparency and openness are considered by many to be part of the scientific method. [1]

Contents

A number of funding agencies and science journals require authors of peer-reviewed papers to share any supplemental information (raw data, statistical methods or source code) necessary to understand, develop or reproduce published research. A great deal of scientific research is not subject to data sharing requirements, and many of these policies have liberal exceptions. In the absence of any binding requirement, data sharing is at the discretion of the scientists themselves. In addition, in certain situations governments [2] and institutions prohibit or severely limit data sharing to protect proprietary interests, national security, and subject/patient/victim confidentiality. Data sharing may also be restricted to protect institutions and scientists from use of data for political purposes.

Data and methods may be requested from an author years after publication. In order to encourage data sharing [3] and prevent the loss or corruption of data, a number of funding agencies and journals established policies on data archiving. Access to publicly archived data is a recent development in the history of science made possible by technological advances in communications and information technology. To take full advantage of modern rapid communication may require consensual agreement on the criteria underlying mutual recognition of respective contributions. Models recognized for improving the timely sharing of data for more effective response to emergent infectious disease threats include the data sharing mechanism introduced by the GISAID Initiative. [4] [5]

Despite policies on data sharing and archiving, data withholding still happens. Authors may fail to archive data or they only archive a portion of the data. Failure to archive data alone is not data withholding. When a researcher requests additional information, an author sometimes refuses to provide it. [6] When authors withhold data like this, they run the risk of losing the trust of the science community. [7] A 2022 study identified about 3500 research papers which contained statements that the data was available, but upon request and further seeking the data, found that it was unavailable for 94% of papers. [8]

Data sharing may also indicate the sharing of personal information on a social media platform.

U.S. government policies

Federal law

On August 9, 2007, President Bush signed the America COMPETES Act (or the "America Creating Opportunities to Meaningfully Promote Excellence in Technology, Education, and Science Act") requiring civilian federal agencies to provide guidelines, policies and procedures, to facilitate and optimize the open exchange of data and research between agencies, the public and policymakers. See Section 1009. [9]

NIH data sharing policy

‘The National Institutes of Health (NIH) Grants Policy Statement defines "data" as "recorded information, regardless of the form or medium on which it may be recorded, and includes writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data."’

Council on Governmental Relations [10]

The NIH Final Statement of Sharing of Research Data says:

‘NIH reaffirms its support for the concept of data sharing. We believe that data sharing is essential for expedited translation of research results into knowledge, products, and procedures to improve human health. The NIH endorses the sharing of final research data to serve these and other important scientific goals. The NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers. ‘NIH recognizes that the investigators who collect the data have a legitimate interest in benefiting from their investment of time and effort. We have therefore revised our definition of "the timely release and sharing" to be no later than the acceptance for publication of the main findings from the final data set. NIH continues to expect that the initial investigators may benefit from first and continuing use but not from prolonged exclusive use.’

NSF Policy from Grant General Conditions

36. Sharing of Findings, Data, and Other Research Products

a. NSF …expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections and other supporting materials created or gathered in the course of the work. It also encourages awardees to share software and inventions or otherwise act to make the innovations they embody widely useful and usable.

b. Adjustments and, where essential, exceptions may be allowed to safeguard the rights of individuals and subjects, the validity of results, or the integrity of collections or to accommodate legitimate interests of investigators.

"National Science Foundation: Grant General Conditions (GC-1)", April 1, 2001 (p. 17).

Office of Research Integrity

Allegations of misconduct in medical research carry severe consequences. The United States Department of Health and Human Services established an office to oversee investigations of allegations of misconduct, including data withholding. The website defines the mission:

"The Office of Research Integrity (ORI) promotes integrity in biomedical and behavioral research supported by the U.S. Public Health Service (PHS) at about 4,000 institutions worldwide. ORI monitors institutional investigations of research misconduct and facilitates the responsible conduct of research (RCR) through educational, preventive, and regulatory activities."

Ideals in data sharing

Some research organizations feel particularly strongly about data sharing. Stanford University's WaveLab has a philosophy about reproducible research and disclosing all algorithms and source code necessary to reproduce the research. In a paper titled "WaveLab and Reproducible Research," the authors describe some of the problems they encountered in trying to reproduce their own research after a period of time. In many cases, it was so difficult they gave up the effort. These experiences are what convinced them of the importance of disclosing source code. [12] The philosophy is described:

The idea is: An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. [13] [14]

The Data Observation Network for Earth (DataONE) and Data Conservancy [15] are projects supported by the National Science Foundation to encourage and facilitate data sharing among research scientists and better support meta-analysis. In environmental sciences, the research community is recognizing that major scientific advances involving integration of knowledge in and across fields will require that researchers overcome not only the technological barriers to data sharing but also the historically entrenched institutional and sociological barriers. [16] Dr. Richard J. Hodes, director of the National Institute on Aging has stated, "the old model in which researchers jealously guarded their data is no longer applicable". [17]

The Alliance for Taxpayer Access is a group of organizations that support open access to government sponsored research. The group has expressed a "Statement of Principles" explaining why they believe open access is important. [18] They also list a number of international public access policies. [19] This is no more so than in timely communication of essential information to effectively respond to health emergencies. [20] While public domain archives have been embraced for depositing data, mainly post formal publication, they have failed to encourage rapid data sharing during health emergencies, among them the Ebola [21] and Zika, [22] [23] outbreaks. More clearly defined principles are required to recognize the interests of those generating the data while permitting free, unencumbered access to and use of the data (pre-publication) for research and practical application, such as those adopted by the GISAID Initiative to counter emergent threats from influenza. [24] [25]

International policies

Data sharing problems in academia

Genetics

Withholding of data has become so commonplace in genetics that researchers at Massachusetts General Hospital published a journal article on the subject. The study found that "Because they were denied access to data, 28% of geneticists reported that they had been unable to confirm published research." [26]

Psychology

In a 2006 study, it was observed that, of 141 authors of a publication from the American Psychological Association (APA) empirical articles, 103 (73%) did not respond with their data over a 6-month period. [27] In a follow-up study published in 2015, it was found that 246 out of 394 contacted authors of papers in APA journals did not share their data upon request (62%). [28]

Archaeology

A 2018 study reported on study of a random sample of 48 articles published during February–May 2017 in the Journal of Archaeological Science which found openly available raw data for 18 papers (53%), with compositional and dating data being the most frequently shared types. The same study also emailed authors of articles on experiments with stone artifacts that were published during 2009 and 2015 to request data relating to the publications. They contacted the authors of 23 articles and received 15 replies, resulting in a 70% response rate. They received five responses that included data files, giving an overall sharing rate of 20%. [29]

Scientists in training

A study of scientists in training indicated many had already experienced data withholding. [30] This study has given rise to the fear the future generation of scientists will not abide by the established practices.

Differing approaches in different fields

Requirements for data sharing are more commonly imposed by institutions, funding agencies, and publication venues in the medical and biological sciences than in the physical sciences. Requirements vary widely regarding whether data must be shared at all, with whom the data must be shared, and who must bear the expense of data sharing.

Funding agencies such as the NIH and NSF tend to require greater sharing of data, but even these requirements tend to acknowledge the concerns of patient confidentiality, costs incurred in sharing data, and the legitimacy of the request. [31] Private interests and public agencies with national security interests (defense and law enforcement) often discourage sharing of data and methods through non-disclosure agreements.

Data sharing poses specific challenges in participatory monitoring initiatives, for example where forest communities collect data on local social and environmental conditions. In this case, a rights-based approach to the development of data-sharing protocols can be based on principles of free, prior and informed consent, and prioritise the protection of the rights of those who generated the data, and/or those potentially affected by data-sharing. [32]

See also

Related Research Articles

<span class="mw-page-title-main">Scientific misconduct</span> Violation of codes of scholarly conduct and ethical behavior in scientific research

Scientific misconduct is the violation of the standard codes of scholarly conduct and ethical behavior in the publication of professional scientific research. It is violation of scientific integrity: violation of the scientific method and of research ethics in science, including in the design, conduct, and reporting of research.

<span class="mw-page-title-main">Scientific journal</span> Periodical journal publishing scientific research

In academic publishing, a scientific journal is a periodical publication designed to further the progress of science by disseminating new research findings to the scientific community. These journals serve as a platform for researchers, scholars, and scientists to share their latest discoveries, insights, and methodologies across a multitude of scientific disciplines. Unlike professional or trade magazines, scientific journals are characterized by their rigorous peer review process, which aims to ensure the validity, reliability, and quality of the published content. With origins dating back to the 17th century, the publication of scientific journals has evolved significantly, playing a pivotal role in the advancement of scientific knowledge, fostering academic discourse, and facilitating collaboration within the scientific community.

<span class="mw-page-title-main">National Institutes of Health</span> US government medical research agency

The National Institutes of Health, commonly referred to as NIH, is the primary agency of the United States government responsible for biomedical and public health research. It was founded in the late 1880s and is now part of the United States Department of Health and Human Services. Many NIH facilities are located in Bethesda, Maryland, and other nearby suburbs of the Washington metropolitan area, with other primary facilities in the Research Triangle Park in North Carolina and smaller satellite facilities located around the United States. The NIH conducts its own scientific research through the NIH Intramural Research Program (IRP) and provides major biomedical research funding to non-NIH research facilities through its Extramural Research Program.

Reproducibility, closely related to replicability and repeatability, is a major principle underpinning the scientific method. For the findings of a study to be reproducible means that results obtained by an experiment or an observational study or in a statistical analysis of a data set should be achieved again with a high degree of reliability when the study is replicated. There are different kinds of replication but typically replication studies involve different researchers using the same methodology. Only after one or several such successful replications should a result be recognized as scientific knowledge.

<span class="mw-page-title-main">Preprint</span> Academic paper prior to journal publication

In academic publishing, a preprint is a version of a scholarly or scientific paper that precedes formal peer review and publication in a peer-reviewed scholarly or scientific journal. The preprint may be available, often as a non-typeset version available free, before or after a paper is published in a journal.

<span class="mw-page-title-main">Open access</span> Research publications distributed freely online

Open access (OA) is a set of principles and a range of practices through which nominally copyrightable publications are delivered to readers free of access charges or other barriers. With open access strictly defined, or libre open access, barriers to copying or reuse are also reduced or removed by applying an open license for copyright, which regulates post-publication uses of the work.

Research funding is a term generally covering any funding for scientific research, in the areas of natural science, technology, and social science. Different methods can be used to disburse funding, but the term often connotes funding obtained through a competitive process, in which potential research projects are evaluated and only the most promising receive funding. It is often measured via Gross domestic expenditure on R&D (GERD).

<span class="mw-page-title-main">Medical research</span> Wide array of research

Medical research, also known as health research, refers to the process of using scientific methods with the aim to produce knowledge about human diseases, the prevention and treatment of illness, and the promotion of health.

PubMed Central (PMC) is a free digital repository that archives open access full-text scholarly articles that have been published in biomedical and life sciences journals. As one of the major research databases developed by the National Center for Biotechnology Information (NCBI), PubMed Central is more than a document repository. Submissions to PMC are indexed and formatted for enhanced metadata, medical ontology, and unique identifiers which enrich the XML structured data for each article. Content within PMC can be linked to other NCBI databases and accessed via Entrez search and retrieval systems, further enhancing the public's ability to discover, read and build upon its biomedical knowledge.

<span class="mw-page-title-main">Open science</span> Generally available scientific research

Open science is the movement to make scientific research and its dissemination accessible to all levels of society, amateur or professional. Open science is transparent and accessible knowledge that is shared and developed through collaborative networks. It encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practice open-notebook science, broader dissemination and engagement in science and generally making it easier to publish, access and communicate scientific knowledge.

<span class="mw-page-title-main">GISAID</span> Global initiative for sharing virus data

GISAID, the Global Initiative on Sharing All Influenza Data, previously the Global Initiative on Sharing Avian Influenza Data, is a global science initiative established in 2008 to provide access to genomic data of influenza viruses. The database was expanded to include the coronavirus responsible for the COVID-19 pandemic, as well as other pathogens. The database has been described as "the world's largest repository of COVID-19 sequences". GISAID facilitates genomic epidemiology and real-time surveillance to monitor the emergence of new COVID-19 viral strains across the planet.

The concept of team science is a field of scientific philosophy and methodology which advocates using cross-disciplinary collaboration from diverse scientific fields to solve present-day to day problems. The field encompasses conceptual and methodological strategies aimed at understanding and enhancing the processes and outcomes of collaborative, team-based research by pooling resources from different countries, labs and groups to solve problems.

An open-access mandate is a policy adopted by a research institution, research funder, or government which requires or recommends researchers—usually university faculty or research staff and/or research grant recipients—to make their published, peer-reviewed journal articles and conference papers open access (1) by self-archiving their final, peer-reviewed drafts in a freely accessible institutional repository or disciplinary repository or (2) by publishing them in an open-access journal or both.

Open scientific data or open research data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results, and to allow data from many sources to be integrated to give new knowledge.

Research integrity or scientific integrity is an aspect of research ethics that deals with best practice or rules of professional practice of scientists.

The NIH Public Access Policy is an open access mandate, drafted in 2004 and mandated in 2008, requiring that research papers describing research funded by the National Institutes of Health must be available to the public free through PubMed Central within 12 months of publication. PubMed Central is the self-archiving repository in which authors or their publishers deposit their publications. Copyright is retained by the usual holders, but authors may submit papers with one of the Creative Commons licenses.

Figshare is an online open access repository where researchers can preserve and share their research outputs, including figures, datasets, images, and videos. It is free to upload content and free to access, in adherence to the principle of open data. Figshare is one of a number of portfolio businesses supported by Digital Science, a subsidiary of Springer Nature.

Metascience is the use of scientific methodology to study science itself. Metascience seeks to increase the quality of scientific research while reducing inefficiency. It is also known as "research on research" and "the science of science", as it uses research methods to study how research is done and find where improvements can be made. Metascience concerns itself with all fields of research and has been described as "a bird's eye view of science". In the words of John Ioannidis, "Science is the best thing that has happened to human beings ... but we can do it better."

Neo-colonial research or neo-colonial science, frequently described as helicopter research, parachute science or research, parasitic research, or safari study, is when researchers from wealthier countries go to a developing country, collect information, travel back to their country, analyze the data and samples, and publish the results with no or little involvement of local researchers. A 2003 study by the Hungarian Academy of Sciences found that 70% of articles in a random sample of publications about least-developed countries did not include a local research co-author.

Sex as a biological variable (SABV) is a research policy recognizing sex as an important variable to consider when designing studies and assessing results. Research including SABV has strengthened the rigor and reproducibility of findings. Public research institutions including the European Commission, Canadian Institutes of Health Research, and the U.S. National Institutes of Health have instituted SABV policies. Editorial policies were established by various scientific journals recognizing the importance and requiring research to consider SABV.

References

  1. "A Global Health Epidemic Is A Ticking Time Bomb - But Virus Databases Can And Are Helping To Save Lives". HuffPost UK. 12 January 2017. Retrieved 2017-09-06.
  2. "A shot of transparency". The Economist. 2006-08-10. ISSN   0013-0613 . Retrieved 2017-09-06.
  3. "How to encourage the right behaviour". Nature. 416 (6876): 1. 2002. Bibcode:2002Natur.416R...1.. doi: 10.1038/416001b . PMID   11882850.
  4. McCauley, John W. (2017-02-23). "Viruses: Model to accelerate epidemic responses". Nature. 542 (7642): 414. Bibcode:2017Natur.542..414M. doi: 10.1038/542414b . PMID   28230113.
  5. "No Free Lunch, G20 Health Ministers Find At First Meeting". Intellectual Property Watch. 2017-05-20. Retrieved 2017-09-06.
  6. Savage CJ, Vickers AJ (2009). "Empirical Study of Data Sharing by Authors Publishing in PLoS Journals". PLOS ONE. 4 (9): e7078. Bibcode:2009PLoSO...4.7078S. doi: 10.1371/journal.pone.0007078 . PMC   2739314 . PMID   19763261.
  7. "Publication and Openness," chapter from "On Being A Scientist: Responsible Conduct in Research", National Academy of Sciences.
  8. Gabelica, Mirko; Bojčić, Ružica; Puljak, Livia (May 2022). "Many researchers were not compliant with their published data sharing statement: mixed-methods study". Journal of Clinical Epidemiology. 150: 33–41. doi:10.1016/j.jclinepi.2022.05.019. PMID   35654271. S2CID   249213574.
  9. "America COMPETES Act
  10. "Access to and retention of research data Archived 2007-05-26 at the Wayback Machine : Rights and responsibilities", p. 5. Council on Governmental Relations, March 2006.
  11. "NIH Data Sharing Policy."
  12. WaveLab and Reproducible Research by Jonathan B. Buckheit and David L. Donoho
  13. WaveLab850 website
  14. Rimmer, Matthew (2005-09-01). "Japonica Rice: Intellectual Property, Scientific Publishing and Data-sharing". Prometheus. 23 (3): 325–347. doi:10.1080/08109020500235180. ISSN   0810-9028. S2CID   153908749.
  15. "Data Conservancy | Data Conservancy is devoted to developing institutional solutions for the challenges of data collection, preservation and re-use".
  16. Reichman O.J.; Jones M.B.; Schildhauer M.P. (2011). "Challenges and Opportunities of Open Data in Ecology". Science. 331 (6018): 703–705. Bibcode:2011Sci...331..703R. doi:10.1126/science.1197962. PMID   21311007. S2CID   22686503.
  17. Kolata, Gina (3 April 2011). "Vast Gene Study Yields Insights on Alzheimer's (Published 2011)". The New York Times. Archived from the original on 2021-06-09.
  18. The Alliance for Taxpayer Access website
  19. "Worldwide momentum for public access to publicly funded research". Archived from the original on 2007-09-27. Retrieved 2007-09-07.
  20. Shu, Yuelong; McCauley, John (2017). "GISAID: Global initiative on sharing all influenza data – from vision to reality". Eurosurveillance. 22 (13). doi:10.2807/1560-7917.es.2017.22.13.30494. PMC   5388101 . PMID   28382917.
  21. Yozwiak, Nathan L.; Schaffner, Stephen F.; Sabeti, Pardis C. (2015-02-26). "Data sharing: Make outbreak research open access". Nature. 518 (7540): 477–479. Bibcode:2015Natur.518..477Y. doi: 10.1038/518477a . PMID   25719649.
  22. "When research goes off the rails". The Hindu. Retrieved 2017-09-06.
  23. "Benefits of sharing". Nature. 530 (7589): 129. 2016-02-11. Bibcode:2016Natur.530Q.129.. doi: 10.1038/530129a . PMID   26863943.
  24. Elbe, Stefan; Buckland-Merrett, Gemma (2017-01-01). "Data, disease and diplomacy: GISAID's innovative contribution to global health". Global Challenges. 1 (1): 33–46. Bibcode:2017GloCh...1...33E. doi:10.1002/gch2.1018. ISSN   2056-6646. PMC   6607375 . PMID   31565258.
  25. "CDC Races to Create a Vaccine for China's Latest Bird Flu Strain". Bloomberg.com. 2013-04-10. Retrieved 2017-09-06.
  26. Campbell EG, Clarridge BR, Gokhale M, et al. (2002). "Data withholding in academic genetics: evidence from a national survey". JAMA. 287 (4): 473–80. doi:10.1001/jama.287.4.473. PMID   11798369.
  27. Wicherts, J. M.; Borsboom, D.; Kats, J.; Molenaar, D. (2006). "The poor availability of psychological research data for reanalysis". American Psychologist. 61 (7): 726–728. doi:10.1037/0003-066X.61.7.726. PMID   17032082.
  28. Vanpaemel, W.; Vermorgen, M.; Deriemaecker, L.; Storms, G. (2015). "Are we wasting a good crisis? The availability of psychological research data after the storm" (PDF). Collabra. 1 (1): 1–5. doi: 10.1525/collabra.13 .
  29. Marwick, Ben; Birch, Suzanne E. Pilaar (5 April 2018). "A Standard for the Scholarly Citation of Archaeological Data as an Incentive to Data Sharing". Advances in Archaeological Practice. 6 (2): 125–143. doi: 10.1017/aap.2018.3 .
  30. Vogeli C, Yucel R, Bendavid E, et al. (February 2006). "Data withholding and the next generation of scientists: results of a national survey". Acad Med. 81 (2): 128–36. doi: 10.1097/00001888-200602000-00007 . PMID   16436573.
  31. "NIH Data Sharing Policy and Implementation Guidance". grants.nih.gov. Retrieved 2021-04-09.
  32. D Sabogal. 2015. Data sharing in community-based forest monitoring: lessons from Guyana. Global Canopy Programme. http://forestcompass.org/how/resources/data-sharing-community-based-forest-monitoring-lessons-guyana

Literature

Committee on Issues in the Transborder Flow of Scientific Data, National Research Council (1997). Bits of Power: Issues in Global Access to Scientific Data. Washington, D.C.: National Academy Press. doi:10.17226/5504. ISBN   978-0-309-05635-9. — discusses the international exchange of data in the natural sciences.