DataONE

Last updated
logo DataONE logo.jpg
logo
DataONE
Content
DescriptionDataONE, data on the Earth, Life, and the Environment
Data types
captured
Earth science, Ecology, Environmental Science, Social Science
Organisms all
Access
Data format Comma-separated values
NetCDF
XML
Satellite imagery
Website DataONE
Download URL DataONE Search
Web service URL API
Tools
Web DataONE
Standalone dataone R package
Miscellaneous
License Various open data licenses
Bookmarkable
entities
yes

DataONE [1] is a network of interoperable data repositories facilitating data sharing, data discovery, and open science. [2] Originally supported by $21.2 million in funding from the US National Science Foundation as one of the initial DataNet programs in 2009, [3] funding was renewed in 2014 through 2020 with an additional $15 million. [4] DataONE helps preserve, access, use, and reuse of multi-discipline scientific data through the construction of primary cyberinfrastructure and an education and outreach program. DataONE provides scientific data archiving for ecological and environmental data produced by scientists. DataONE's goal is to preserve and provide access to multi-scale, multi-discipline, and multi-national data. Users include scientists, ecosystem managers, policy makers, students, educators, librarians, and the public.

Contents

DataONE links together existing cyberinfrastructure to provide a distributed framework, management, and technologies that enable long-term preservation of multi-scale, multi-discipline, and multi-national observational data. The distributed framework is composed of Coordinating Nodes located at the Oak Ridge Campus at Tennessee, University of California Santa Barbara, and University of New Mexico, and member nodes. DataONE also provides resources including tools for accessing and using it. [5]

Coordinating nodes

The three coordinating nodes provide network-wide services to member nodes. They are geographically replicated, with mirrored content and full copies of science metadata. William Michener of the University of New Mexico (UNM) directed the project, and UNM is one of the coordinating nodes. [6] Coordinating nodes are UNM, Oak Ridge Campus (partnership of Oak Ridge National Laboratory (ORNL) and University of Tennessee), and the University of California, Santa Barbara.

Member nodes

Member nodes consist of Earth observing institutions, projects, and networks. They provide resources for their own data and replicated data, and focus on serving their specific constituencies. These member nodes are geographically distributed and include:

Investigator Tool Kit

The Tool Kit provides tools for researchers to access DataONE. These are both general purpose and discipline-specific tools, and developers adapt existing tools where possible. The tool kit includes Java and Python libraries, an R programming language plug-in for analysis, extensions for Excel, the VisTrails scientific workflow, and the Kepler scientific workflow system.

Data management

DataONE provides a place for scientists to store data and its associated metadata. The metadata makes this data searchable and accessible to other scientists. Data management practices include

Some of the additional data management planning resources include: a primer for best practices, a database for best practices in data management, educational modules and tutorials, webinars, and an investigator toolkit. [5] These have been used or adapted for use under Creative Commons license by organizations and institutions that seek to educate other communities about data and research management. Understanding different audiences of users led to the development of possible user personas as models for users such as early-career researchers, science data librarians, citizen scientists or K-12 educators.

Collaborations

DataONE collaborates with other institutions to bring together tools that help with data management practices. One of those tools, developed in collaboration with other organizations and hosted by the University of California Digital Curation Center, is the DMPTool for data management planning. The DMP Tool is used by and referenced by many research data management plans and institutions in the US and around the world. Another recent collaboration in this area is the shared construction of a Data Management Training Clearinghouse for Earth sciences, in partnership with USGS and the Community for Data Integration (CDI). [35]

Community

The DataONE community includes research networks, professional societies, libraries, academic institutions, data centers, data repositories, environmental observatory networks, educators, scientists, policy makers, administrators, citizen scientists, international organizations, NGOs, ecosystem managers, students, private companies and the public. DataONE has a users group that meets yearly to provide feedback. [36]

Related Research Articles

Ecoinformatics, or ecological informatics, is the science of information in ecology and environmental science. It integrates environmental and information sciences to define entities and natural processes with language common to both humans and computers. However, this is a rapidly developing area in ecology and there are alternative perspectives on what constitutes ecoinformatics.

<span class="mw-page-title-main">National Snow and Ice Data Center</span> U.S. information and referral center

The National Snow and Ice Data Center (NSIDC) is a United States information and referral center in support of polar and cryospheric research. NSIDC archives and distributes digital and analog snow and ice data and also maintains information about snow cover, avalanches, glaciers, ice sheets, freshwater ice, sea ice, ground ice, permafrost, atmospheric ice, paleoglaciology, and ice cores.

<span class="mw-page-title-main">Byrd Polar and Climate Research Center</span>

The Byrd Polar and Climate Research Center (BPCRC) is a polar, alpine, and climate research center at The Ohio State University founded in 1960.

The Earth Observing System Data and Information System (EOSDIS) is a key core capability in NASA’s Earth Science Data Systems Program. Designed and maintained by Raytheon Intelligence & Space, it is a comprehensive data and information system designed to perform a wide variety of functions in support of a heterogeneous national and international user community.

The Long Term Ecological Research (LTER) Network consists of a group of over 1800 scientists and students studying ecological processes over extended temporal and spatial scales. Twenty-eight LTER sites cover a diverse set of ecosystems. It is part of the International Long Term Ecological Research Network (ILTER). The LTER Program was established in 1980 and is funded by the U.S. National Science Foundation. Data from LTER sites is publicly available in the Environmental Data Initiative repository and findable through DataONE search.

A current research information system (CRIS) is a database or other information system to store, manage and exchange contextual metadata for the research activity funded by a research funder or conducted at a research-performing organisation. CRIS systems are also known as Research Information Management or RIM Systems (RIMS).

Kepler is a free software system for designing, executing, reusing, evolving, archiving, and sharing scientific workflows. Kepler's facilities provide process and data monitoring, provenance information, and high-speed data movement. Workflows in general, and scientific workflows in particular, are directed graphs where the nodes represent discrete computational components, and the edges represent paths along which data and results can flow between components. In Kepler, the nodes are called 'Actors' and the edges are called 'channels'. Kepler includes a graphical user interface for composing workflows in a desktop environment, a runtime engine for executing workflows within the GUI and independently from a command-line, and a distributed computing option that allows workflow tasks to be distributed among compute nodes in a computer cluster or computing grid. The Kepler system principally targets the use of a workflow metaphor for organizing computational tasks that are directed towards particular scientific analysis and modeling goals. Thus, Kepler scientific workflows generally model the flow of data from one step to another in a series of computations that achieve some scientific goal.

The National Center for Computational Sciences (NCCS) is a United States Department of Energy (DOE) Leadership Computing Facility that houses the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading high-performance computing (HPC) resources and international expertise in scientific computing.

AGRIS is a global public domain database with more than 12 million structured bibliographical records on agricultural science and technology. It became operational in 1975 and the database was maintained by Coherence in Information for Agricultural Research for Development, and its content is provided by more than 150 participating institutions from 65 countries. The AGRIS Search system, allows scientists, researchers and students to perform sophisticated searches using keywords from the AGROVOC thesaurus, specific journal titles or names of countries, institutions, and authors.

Data curation is the organization and integration of data collected from various sources. It involves annotation, publication and presentation of the data so that the value of the data is maintained over time, and the data remains available for reuse and preservation. Data curation includes "all the processes needed for principled and controlled data creation, maintenance, and management, together with the capacity to add value to data". In science, data curation may indicate the process of extraction of important information from scientific texts, such as research articles by experts, to be converted into an electronic format, such as an entry of a biological database.

Discovery Net is one of the earliest examples of a scientific workflow system allowing users to coordinate the execution of remote services based on Web service and Grid Services standards. The system was designed and implemented at Imperial College London as part of the Discovery Net pilot project funded by the UK e-Science Programme. Many of the concepts pioneered by Discovery Net have been later incorporated into a variety of other scientific workflow systems.

A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application.

LabKey Server is a software suite available for scientists to integrate, analyze, and share biomedical research data. The platform provides a secure data repository that allows web-based querying, reporting, and collaborating across a range of data sources. Specific scientific applications and workflows can be added on top of the basic platform and leverage a data processing pipeline.

The ORNL DAAC for Biogeochemical Dynamics is a National Aeronautics and Space Administration (NASA) Earth Observing System Data and Information System (EOSDIS) data center managed by the Earth Science Data and Information System (ESDIS) Project. Established in 1993, the ORNL DAAC is operated by Oak Ridge National Laboratory in Oak Ridge, Tennessee, under an interagency agreement between NASA and the Department of Energy (DOE). Within the ORNL, the ORNL DAAC is part of the Remote Sensing and Environmental Informatics Group of the Environmental Sciences Division (ESD) and a contributor to the Climate Change Science Institute (CCSI).

<span class="mw-page-title-main">South African Environmental Observation Network</span> A network to perform long-term ecological research in South Africa and surrounding waters

The South African Environmental Observation Network (SAEON) is a science network of people, organisations and, most importantly observation platforms, that perform Long-Term Ecological Research (LTER) in South Africa and its surrounding oceans. The SAEON is of global importance as an innovative approach in ecology to understand environmental change and to determine the impact of anthropogenic forces at multiple scales but it is a remarkably complex challenge to statistically discern between ubiquitous natural variability and exogenous forcing. The SAEON constitutes a national government response to the World Summit on Sustainable Development (Earth Summit 2002) and is a component of the GEO (Group on Earth Observations). The SAEON has become the leader in environmental science and observation in South Africa, but has been criticised for taking a long time to establish, a situation which was inevitable in view of SAEON's multiple stakeholder corps. It has also been raised that the cost of replicated experimental treatments across SAEON sites will be high.

The Alaska Satellite Facility is a data processing facility and satellite-tracking ground station within the Geophysical Institute at the University of Alaska Fairbanks. The facility’s mission is to make remote-sensing data accessible Its work is central to polar processes research including wetlands, glaciers, sea ice, climate change, permafrost, flooding and land cover such as changes in the Amazon rainforest.

<span class="mw-page-title-main">Andrews Forest</span>

The H.J. Andrews Experimental Forest, commonly referred to as Andrews Forest, is located near Blue River, Oregon, United States, and is managed cooperatively by the United States Forest Service's Pacific Northwest Research Station, Oregon State University, and the Willamette National Forest. It was one of only 610 UNESCO International Biosphere Reserves, until being withdrawn from the program as of June 14, 2017, and a Long Term Ecological Research site. It is situated in the middle of the Western Cascades.

The 'German Network for Bioinformatics Infrastructure – de.NBI' is a national, academic and non-profit infrastructure initiated by the Federal Ministry of Education and Research funding 2015-2021. The network provides bioinformatics services to users in life sciences research and biomedicine in Germany and Europe. The partners organize training events, courses and summer schools on tools, standards and compute services provided by de.NBI to assist researchers to more effectively exploit their data. From 2022, the network will be integrated into Forschungszentrum Jülich.

<span class="mw-page-title-main">ECOSTRESS</span>

ECOSTRESS is an ongoing scientific experiment in which a radiometer mounted on the International Space Station (ISS) measures the temperature of plants growing in specific locations on Earth over the course of a solar year. These measurements give scientists insight into the effects of events like heat waves and droughts on crops.

Anne E. Giblin is a marine biologist who researches the cycling of elements nitrogen, sulfur, iron and phosphorus. She is a Senior Scientist and Acting Director of the Ecosystem Center at the Marine Biological Lab.

References

  1. "DataONE". DataONE. Retrieved 2016-04-21.
  2. Michener, William; Vieglais, Dave; Vision, Todd; Kunze, John; Cruse, Patricia; Janée, Greg (January 2011). "DataONE: Data Observation Network for Earth: Preserving Data and Enabling Innovation in the Biological and Environmental Sciences". D-Lib Magazine. 17 (1/2). doi: 10.1045/january2011-michener . ISSN   1082-9873 . Retrieved 2014-02-12.
  3. "DataNet Full Proposal: DataNetONE (Observation Network for Earth)". Award abstract #0830944. National Science Foundation. August 26, 2014. Retrieved May 7, 2017.
  4. "Award Abstract # 1430508 DataONE (Data Observation Network for Earth)". National Science Foundation. National Science Foundation. Retrieved 2021-06-24.
  5. 1 2 "Investigator Toolkit". Web page. DataONE. Retrieved May 7, 2017.
  6. "DataONE (Observation Network for Earth) Project at UNM Receives $20 Million Award". Press release. University of New Mexico. November 18, 2009. Archived from the original on November 27, 2009. Retrieved May 7, 2017.
  7. "Welcome to eBird". eBird.org. Retrieved 2016-04-21.
  8. "Dryad Digital Repository - Dryad". Datadryad.org. Retrieved 2016-04-21.
  9. "Earth Data Analysis Center | Center for Geospatial & Information Technology Services". Edac.unm.edu. Retrieved 2016-04-21.
  10. "Environmental Data for the Oak Ridge Area : Search". Mercury-ops2.ornl.gov. Retrieved 2016-04-21.
  11. "ESA Data Registry". Data.esa.org. Retrieved 2016-04-21.
  12. "Taking Europe's pulse - Research for our continent's future — LTER in Europe". Lter-europe.net. Retrieved 2016-04-21.
  13. "GLEON". GLEON. Retrieved 2016-04-21.
  14. "Gulf of Alaska Data Portal". Portal.aoos.org. Retrieved 2016-04-21.
  15. "The IARC Data Archive at UAF, an AA/EO employer and educational institution". Climate.iarc.uaf.edu. 2007-08-23. Retrieved 2016-04-21.
  16. "Cumulative human impacts data (2008 and 2013) Halpern B, et al. 2015". Knb.ecoinformatics.org. Archived from the original (JSP) on 2013-11-13. Retrieved 2016-04-21.
  17. The Long Term Ecological Research Network. "The Long Term Ecological Research Network | Long-term, broad-scale research to understand our world". Lternet.edu. Retrieved 2016-04-21.
  18. "UC3 Merritt Home". Merritt.cdlib.org. Retrieved 2016-04-21.
  19. "MPC Data Projects". Ipums.org. Retrieved 2016-04-21.
  20. "Current Member Nodes". DataONE. Archived from the original on 2016-04-19. Retrieved 2016-04-21.
  21. "Nevada Research Data Center". Sensor.nevada.edu. Archived from the original on 2016-05-11. Retrieved 2016-04-21.
  22. "Current Member Nodes". DataONE. Archived from the original on 2016-04-19. Retrieved 2016-04-21.
  23. "Dash". Oneshare.cdlib.org. Archived from the original on 2016-03-25. Retrieved 2016-04-21.
  24. "ORNL DAAC for Biogeochemical Dynamics". Daac.ornl.gov. doi:10.1016/j.foreco.2008.11.016 . Retrieved 2016-04-21.
  25. "Pisco | Pisco". Data.piscoweb.org. Archived from the original on 2016-05-02. Retrieved 2016-04-21.
  26. "Welcome to the CENBAM Portal and PPBio Western Amazonia! | ppbio.inpa.gov.br/inicio". Ppbio.inpa.gov.br. Retrieved 2022-08-26.
  27. "Regional and Global Data Available Through Mercury". Daac.ornl.gov. 2010-03-18. doi:10.1016/j.foreco.2008.11.016. Archived from the original on 2007-12-10. Retrieved 2016-04-21.
  28. "South African National Parks - SANParks - Official Website - Accommodation, Activities, Prices, Reservations". SANParks.org.za. Archived from the original on 2016-01-11. Retrieved 2016-04-21.
  29. "SEAD | A Knowledge Network for Collaboration, Data Curation, and Discovery". Sead-data.net. Retrieved 2016-04-21.
  30. "TFRI Metacat Data Catalog". Metacat.tfri.gov.tw. Retrieved 2016-04-21.
  31. "Terrestrial Ecosystem Research Network: Home". TERN. Retrieved 2016-04-21.
  32. "KU Biodiversity Institute & Natural History Museum". Biodiversity.ku.edu. Retrieved 2016-04-21.
  33. "USA National Phenology Network | USA National Phenology Network". Usanpn.org. 2016-04-15. Retrieved 2016-04-21.
  34. "U.S. Geological Survey Science Data Catalog". Data.usgs.gov. Retrieved 2016-04-21.
  35. "Data Management Training Clearinghouse - ScienceBase-Catalog". www.sciencebase.gov. Retrieved 3 April 2022.
  36. "Users Group". DataONE. Retrieved 2016-04-21.