Dryad (repository)

Last updated
Dryad
Developer(s) National Evolutionary Synthesis Center, UNC-CH Metadata Research Center, Oxford University, The British Library, California Digital Library
Initial releaseJanuary 2008
Repository
Written in Ruby
License New BSD license
Website datadryad.org

Dryad is an international open-access repository of research data, especially data underlying scientific and medical publications (mainly of evolutionary, genetic, and ecology biology). Dryad is a curated general-purpose repository that makes data discoverable, freely reusable, and citable. The scientific, educational, and charitable mission of Dryad is to provide the infrastructure for and promote the re-use of scholarly research data.

Contents

The vision of Dryad is a scholarly communication system in which learned societies, publishers, institutions of research and education, funding bodies and other stakeholders collaboratively sustain and promote the preservation and reuse of research data.

Dryad aims to allow researchers to validate published findings, explore new analysis methodologies, re-purpose data for research questions unanticipated by the original authors, and perform synthetic studies such as formal meta-analyses. For many publications, existing data repositories do not capture the whole data package. As a result, many important datasets are not being preserved and are no longer available, or usable, at the time that they are sought by later investigators. [1]

Dryad serves as a repository for tables, spreadsheets, flat files, and all other kinds of published data for which specialized repositories do not already exist. Optimally, authors submit data to Dryad in conjunction with article publication, so that links to the data can be included in the published article. All data files in Dryad are associated with a published article, and are made available for reuse under the terms of a Creative Commons Zero waiver.

Dryad is also a non-profit membership organization registered in the US, providing a forum for all stakeholders to set priorities for the repository, participate in planning, and share knowledge and coordinate action around data policies.

Dryad is listed in the Registry of Research Data Repositories re3data.org. [2]

Features

A citation to a Dryad data package on Wikipedia Data Dryad citation on Wikipedia.png
A citation to a Dryad data package on Wikipedia
Embargoes chosen by Dryad data authors Embargoes chosen by Dryad data authors - journal.pbio.1001779.g002.png
Embargoes chosen by Dryad data authors

Dryad enables authors, journals, societies and publishers to facilitate data archiving at the time of publication, when the data are readily available. Data in Dryad receives a permanent, unique Digital object identifier (DOI), which can be included in the published article so that readers are able to access the data. Authors can archive data in Dryad and be assured of its preservation, while satisfying journals' and research funding agencies' mandates to disseminate their research outputs. [3]

Authors submit data to Dryad either when the associated article is under review or has been accepted for publication. The choice depends on whether the journal includes data within the scope of peer reviewer. Authors may also submit data after an article has been published.

Data submission is facilitated by journals sending notices of new manuscripts to Dryad. This saves authors from having to re-enter the bibliographic details when they upload their data files.

Dryad curators review submitted data files and perform quality control on metadata descriptions before inclusion of new content in the repository. Dryad’s metadata approach emphasizes simplicity and interoperability, supported by a Dublin Core metadata application profile. The Metadata Research Center now at the College of Computing and Informatics, Drexel University, formerly served as Dryad’s central curation hub, with primary curation activity now taking place via the non-profit in North Carolina. [4]

Dryad coordinates data submission to specialized repositories where in order to (a) lower user burden by streamlining the submission workflow and (b) allow Dryad and specialized repositories to exchange identifiers and other metadata in order to enable cross-referencing of the different data products associated with a given publication. The first two handshaking partners are TreeBASE and GenBank, which Dryad's partner journals have previously identified as required points of deposition for phylogenetic tree data and DNA sequences, respectively.

Governance, history and funding

Dryad is governed by a twelve-member Board of Directors, elected by its Members. Members may be independent journals, societies, publishers, research and educational institutions, libraries, funders, or other organizations that support Dryad's mission. The organization coordinates data sharing policies and promotes the long-term sustainability of the repository.

Dryad began charging submission fees in September 2013. Dryad is a nonprofit organization that provides long-term access to its contents at no cost to researchers, educators or students, irrespective of nationality or institutional affiliation. Dryad is able to provide free access to data due to financial support from members and data submitters. Dryad’s submission fees are designed to sustain its core functions by recovering the basic costs of curating and preserving data.

Dryad emerged from a National Evolutionary Synthesis Center (NESCent) workshop entitled "Digital data preservation, sharing, and discovery: Challenges for Small Science Communities in the Digital Era" in May 2007. Initial funding for Dryad was provided by the National Science Foundation to the National Evolutionary Synthesis Center and other partners in the US.

DryadUK was a Jisc-funded project run from the British Library and the University of Oxford, in partnership with NESCent, the Digital Curation Centre, and Charles Beagrie Ltd. The project led to a UK mirror of the Dryad repository based at the British Library. The project also improved the tools available for the publication and citation of data, expanded the disciplinary range of participating journals, and further developed the business framework for an international organization dedicated to long-term data preservation.

In 2019, Dryad announced a partnership with fellow data-repository Zenodo to co-develop new solutions focused on supporting researcher and publisher workflows as well as best practices in software and data curation. [5]

Dryad is a member of the Data Observation Network for Earth (DataONE).

Software

Dryad was originally built upon the open source DSpace repository software, developed by the Massachusetts Institute of Technology and Hewlett-Packard. In 2019, Dryad migrated to an open-source, Ruby-on-Rails data publication platform called Stash.

See also

Related Research Articles

BioMed Central (BMC) is a United Kingdom-based, for-profit scientific open access publisher that produces over 250 scientific journals. All its journals are published online only. BioMed Central describes itself as the first and largest open access science publisher. It was founded in 2000 and has been owned by Springer, now Springer Nature, since 2008.

PubMed Central (PMC) is a free digital repository that archives open access full-text scholarly articles that have been published in biomedical and life sciences journals. As one of the major research databases developed by the National Center for Biotechnology Information (NCBI), PubMed Central is more than a document repository. Submissions to PMC are indexed and formatted for enhanced metadata, medical ontology, and unique identifiers which enrich the XML structured data for each article. Content within PMC can be linked to other NCBI databases and accessed via Entrez search and retrieval systems, further enhancing the public's ability to discover, read and build upon its biomedical knowledge.

The California Digital Library (CDL) was founded by the University of California in 1997. Under the leadership of then UC President Richard C. Atkinson, the CDL's original mission was to forge a better system for scholarly information management and improved support for teaching and research. In collaboration with the ten University of California Libraries and other partners, CDL assembled one of the world's largest digital research libraries. CDL facilitates the licensing of online materials and develops shared services used throughout the UC system. Building on the foundations of the Melvyl Catalog, CDL has developed one of the largest online library catalogs in the country and works in partnership with the UC campuses to bring the treasures of California's libraries, museums, and cultural heritage organizations to the world. CDL continues to explore how services such as digital curation, scholarly publishing, archiving and preservation support research throughout the information lifecycle.

Agricultural Information Management Standards (AIMS) is a web site managed by the Food and Agriculture Organization of the United Nations (FAO) for accessing and discussing agricultural information management standards, tools and methodologies connecting information workers worldwide to build a global community of practice. Information management standards, tools and good practices can be found on AIMS:

The Digital Curation Centre (DCC) was established to help solve the extensive challenges of digital preservation and digital curation and to lead research, development, advice, and support services for higher education institutions in the United Kingdom.

Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences. The various academic journals have differing policies regarding how much of their data and methods researchers are required to store in a public archive, and what is actually archived varies widely between different disciplines. Similarly, the major grant-giving institutions have varying attitudes towards public archival of data. In general, the tradition of science has been for publications to contain sufficient information to allow fellow researchers to replicate and therefore test the research. In recent years this approach has become increasingly strained as research in some areas depends on large datasets which cannot easily be replicated independently.

Nature Precedings was an open access electronic preprint repository of scholarly work in the fields of biomedical sciences, chemistry, and earth sciences. It ceased accepting new submissions as of April 3, 2012.

<span class="mw-page-title-main">National Evolutionary Synthesis Center</span> Scientific research center in Durham, North Carolina

The United States National Evolutionary Synthesis Center (NESCent) is a scientific research center in Durham, North Carolina. Known by its acronym NESCent, the center’s goal is to promote collaborative, cross-disciplinary research in evolutionary biology.

DataONE is a network of interoperable data repositories facilitating data sharing, data discovery, and open science. Originally supported by $21.2 million in funding from the US National Science Foundation as one of the initial DataNet programs in 2009, funding was renewed in 2014 through 2020 with an additional $15 million. DataONE helps preserve, access, use, and reuse of multi-discipline scientific data through the construction of primary cyberinfrastructure and an education and outreach program. DataONE provides scientific data archiving for ecological and environmental data produced by scientists. DataONE's goal is to preserve and provide access to multi-scale, multi-discipline, and multi-national data. Users include scientists, ecosystem managers, policy makers, students, educators, librarians, and the public.

TreeBASE was a repository of phylogenetic data published in scientific journals. In phylogenetic studies, research data are collected or generated, such as comparative observations made on a set of taxa, metadata about these taxa, and the phylogenetic trees that are inferred to best describe the evolutionary relationships between the taxa.

A data management plan or DMP is a formal document that outlines how data are to be handled both during a research project, and after the project is completed. The goal of a data management plan is to consider the many aspects of data management, metadata generation, data preservation, and analysis before the project begins; this may lead to data being well-managed in the present, and prepared for preservation in the future.

Open scientific data or open research data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results, and to allow data from many sources to be integrated to give new knowledge.

An open repository or open-access repository is a digital platform that holds research output and provides free, immediate and permanent access to research results for anyone to use, download and distribute. To facilitate open access such repositories must be interoperable according to the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Search engines harvest the content of open access repositories, constructing a database of worldwide, free of charge available research.

Islandora is a free and open-source software digital repository system based on Fedora Commons, Drupal and a host of additional applications. It is open source software and was originally developed at the University of Prince Edward Island by the Robertson Library.

Enhanced publications or enhanced ebooks are a form of electronic publishing for the dissemination and sharing of research outcomes, whose first formal definition can be tracked back to 2009. As many forms of digital publications, they typically feature a unique identifier and descriptive metadata information. Unlike traditional digital publications, enhanced publications are often tailored to serve specific scientific domains and are generally constituted by a set of interconnected parts corresponding to research assets of several kinds and to textual descriptions of the research. The nature and format of such parts and of the relationships between them, depends on the application domain and may largely vary from case to case.

The Registry of Research Data Repositories (re3data.org) is an open science tool that offers researchers, funding organizations, libraries and publishers an overview of existing international repositories for research data.

Data publishing is the act of releasing research data in published form for use by others. It is a practice consisting in preparing certain data or data set(s) for public use thus to make them available to everyone to use as they wish. This practice is an integral part of the open science movement. There is a large and multidisciplinary consensus on the benefits resulting from this practice.

<span class="mw-page-title-main">MetaboLights</span>

MetaboLights is a data repository founded in 2012 for cross-species and cross-platform metabolomic studies that provides primary research data and meta data for metabolomic studies as well as a knowledge base for properties of individual metabolites. The database is maintained by the European Bioinformatics Institute (EMBL-EBI) and the development is funded by Biotechnology and Biological Sciences Research Council (BBSRC). As of July 2018, the MetaboLights browse functionality consists of 383 studies, two analytical platforms, NMR spectroscopy and mass spectrometry.

<span class="mw-page-title-main">Zenodo</span> Research data repository

Zenodo is a general-purpose open repository developed under the European OpenAIRE program and operated by CERN. It allows researchers to deposit research papers, data sets, research software, reports, and any other research related digital artefacts. For each submission, a persistent digital object identifier (DOI) is minted, which makes the stored items easily citeable.

The Biodiversity Literature Repository (BLR) is a biodiversity dedicated community created in November 11, 2013, in Zenodo, the open science repository at CERN and part of the European project OpenAIRE. The goal of BLR is to provide a long-term, stable, open repository that allows deposition of bio-taxonomic articles enhanced with custom metadata and links to data extracted from therein and deposited in BLR. As of April 25, 2021, this includes 94,443 taxonomic treatments and 293,457 figures from 48,993 articles which are made findable, accessible, interoperable and reusable FAIR data. Most of the data is uploaded on a continuous basis by Plazi using its TreatmentBank service based on their Plazi workflow, and Pensoft Publishers using BLR as repository for data published in their journals. The largest single re-user of data is the Global Biodiversity Information Facility (GBIF), using data from within 33,623 processed articles.

References

  1. Vision, T.J. (2010) Open Data and the Social Contract of Scientific Publishing. BioScience 60(5):330-330. doi : 10.1525/bio.2010.60.5.2
  2. "Dryad Entry in re3data.org". www.re3data.org. Retrieved 21 July 2014.
  3. Data Archiving. Michael C. Whitlock, Mark A. McPeek, Mark D. Rausher, Loren Rieseberg and Allen J. Moore, The American Naturalist, Vol. 175, No. 2 (February 2010), pp. 145-146 doi : 10.1086/650340
  4. Greenberg, J., White, H., C, Carrier, C. and Scherle, R. (2009). A Metadata Best Practice for a Scientific Data Repository. Journal of Library Metadata, 9:3, 194—212. doi : 10.1080/19386380903405090
  5. Scheld, Melissanne (2019-07-17). "Funded Partnership Brings Dryad and Zenodo Closer". Dryad news and views. Retrieved 2019-11-08.