Datanet

Last updated

DataNet, or Sustainable Digital Data Preservation and Access Network Partner, was a research program of the U.S. National Science Foundation Office of Cyberinfrastructure. The office announced a request for proposals with this title on September 28, 2007. [1] The lead paragraph of its synopsis describes the program as:

Science and engineering research and education are increasingly digital and increasingly data-intensive. Digital data are not only the output of research but provide input to new hypotheses, enabling new scientific insights and driving innovation. Therein lies one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams. This solicitation addresses that challenge by creating a set of exemplar national and global data research infrastructure organizations (dubbed DataNet Partners) that provide unique opportunities to communities of researchers to advance science and/or engineering research and learning.

The introduction in the solicitation [2] goes on to say:

Chapter 3 (Data, Data Analysis, and Visualization) of NSF’s Cyberinfrastructure Vision for 21st century Discovery presents a vision in which “science and engineering digital data are routinely deposited in well-documented form, are regularly and easily consulted and analyzed by specialists and non-specialists alike, are openly accessible while suitably protected, and are reliably preserved.” The goal of this solicitation is to catalyze the development of a system of science and engineering data collections that is open, extensible and evolvable.

The initial plan called for a $100 million initiative: five awards of $20 million each over five years with the possibility of continuing funding. Awards were given in two rounds. In the first round, for which full proposals were due on March 21, 2008, two DataNet proposals were awarded. DataONE, [3] led by William Michener at the University of New Mexico covers ecology, evolutionary, and earth science. The Data Conservancy, [4] led by Sayeed Choudhury of Johns Hopkins University, focuses on astronomy, earth science, life sciences, and social science.

For the second round, preliminary proposals were due on October 6, 2008, and full proposals on February 16, 2009. Awards from the second round were greatly delayed, and funding was reduced substantially from $20 million per project to $8 million. [5] Funding for three second round projects began in Fall 2011. SEAD: Sustainable Environment through Actionable Data, [6] led by Margaret Hedstrom of the University of Michigan, seeks to provide data curation software and services for the "long tail" of small- and medium-scale data producers in the domain of sustainability science. The DataNet Federation Consortium, [7] led by Reagan Moore of the University of North Carolina, uses the integrated Rule-Oriented Data System (iRODS) to provide data grid infrastructure for science and engineering. Terra Populus, [8] led by Steven Ruggles of the University of Minnesota focuses on tools for data integration across the domains of social science and environmental data, allowing interoperability of the three major data formats used in these domains: microdata, areal data, and raster data.

Related Research Articles

<span class="mw-page-title-main">National Center for Supercomputing Applications</span> Illinois-based applied supercomputing research organization

The National Center for Supercomputing Applications (NCSA) is a state-federal partnership to develop and deploy national-scale cyberinfrastructure that advances research, science and engineering based in the United States. NCSA operates as a unit of the University of Illinois Urbana-Champaign, and provides high-performance computing resources to researchers across the country. Support for NCSA comes from the National Science Foundation, the state of Illinois, the University of Illinois, business and industry partners, and other federal agencies.

<span class="mw-page-title-main">San Diego Supercomputer Center</span> Supercomputer at UC San Diego.

The San Diego Supercomputer Center (SDSC) is an organized research unit of the University of California, San Diego (UCSD). SDSC is located at the UCSD campus' Eleanor Roosevelt College east end, immediately north the Hopkins Parking Structure.

E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" and the human digital footprint for the social sciences.

<span class="mw-page-title-main">Geoinformatics</span> Application of information science methods in geography, , and geosciences

Geoinformatics is a scientific field primarily within the domains of Computer Science and technical geography. It focuses on the programming of applications, spatial data structures, and the analysis of objects and space-time phenomena related to the surface and underneath of Earth and other celestial bodies. The field develops software and web services to model and analyse spatial data, serving the needs of geosciences and related scientific and engineering disciplines. The term is often used interchangeably with Geomatics, although the two have distinct focuses; Geomatics emphasizes acquiring spatial knowledge and leveraging information systems, not their development. At least one publication has claimed the discipline is pure computer science outside the realm of geography.

United States federal research funders use the term cyberinfrastructure to describe research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services distributed over the Internet beyond the scope of a single institution. In scientific usage, cyberinfrastructure is a technological and sociological solution to the problem of efficiently connecting laboratories, data, computers, and people with the goal of enabling derivation of novel scientific theories and knowledge.

<span class="mw-page-title-main">TeraGrid</span>

TeraGrid was an e-Science grid computing infrastructure combining resources at eleven partner sites. The project started in 2001 and operated from 2004 through 2011.

Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial and scientific domains. Data integration appears with increasing frequency as the volume, complexity and the need to share existing data explodes. It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. Data integration encourages collaboration between internal as well as external users. The data being integrated must be received from a heterogeneous database system and transformed to a single coherent data store that provides synchronous data across a network of files for clients. A common use of data integration is in data mining when analyzing and extracting information from existing databases that can be useful for Business information.

The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that is based on comprehensive advanced computing resources and supports services to researchers in Texas and across the U.S. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high performance computing, scientific visualization, data analysis & storage systems, software, research & development and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable the research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.

Margaret L. Hedstrom is an American archivist who is the Robert M. Warner Collegiate Professor of Information at the University of Michigan School of Information. She has contributed to the field of digital preservation, archives, and electronic records management and holds a doctorate in history from the University of Wisconsin.

<span class="mw-page-title-main">Renaissance Computing Institute</span>

Renaissance Computing Institute (RENCI) was launched in 2004 as a collaboration involving the State of North Carolina, University of North Carolina at Chapel Hill (UNC-CH), Duke University, and North Carolina State University. RENCI is organizationally structured as a research institute within UNC-CH, and its main campus is located in Chapel Hill, NC, a few miles from the UNC-CH campus. RENCI has engagement centers at UNC-CH, Duke University (Durham), and North Carolina State University (Raleigh).

<span class="mw-page-title-main">DataONE</span> International federation of data repositories

DataONE is a network of interoperable data repositories facilitating data sharing, data discovery, and open science. Originally supported by $21.2 million in funding from the US National Science Foundation as one of the initial DataNet programs in 2009, funding was renewed in 2014 through 2020 with an additional $15 million. DataONE helps preserve, access, use, and reuse of multi-discipline scientific data through the construction of primary cyberinfrastructure and an education and outreach program. DataONE provides scientific data archiving for ecological and environmental data produced by scientists. DataONE's goal is to preserve and provide access to multi-scale, multi-discipline, and multi-national data. Users include scientists, ecosystem managers, policy makers, students, educators, librarians, and the public.

Integrated computational materials engineering (ICME) involves the integration of experimental results, design models, simulations, and other computational data related to a variety of materials used in multiscale engineering and design. Central to the achievement of ICME goals has been the creation of a cyberinfrastructure, a Web-based, collaborative platform which provides the ability to accumulate, organize and disseminate knowledge pertaining to materials science and engineering to facilitate this information being broadly utilized, enhanced, and expanded.

iPlant Collaborative

The iPlant Collaborative, renamed Cyverse in 2017, is a virtual organization created by a cooperative agreement funded by the US National Science Foundation (NSF) to create cyberinfrastructure for the plant sciences (botany). The NSF compared cyberinfrastructure to physical infrastructure, "... the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor". In September 2013 it was announced that the National Science Foundation had renewed iPlant's funding for a second 5-year term with an expansion of scope to all non-human life science research.

<span class="mw-page-title-main">Francine Berman</span> American computer scientist

Francine Berman is an American computer scientist, and a leader in digital data preservation and cyber-infrastructure. In 2009, she was the inaugural recipient of the IEEE/ACM-CS Ken Kennedy Award "for her influential leadership in the design, development and deployment of national-scale cyberinfrastructure, her inspiring work as a teacher and mentor, and her exemplary service to the high performance community". In 2004, Business Week called her the "reigning teraflop queen".

Daniel E. Atkins III is the W. K. Kellogg Professor of Community Informatics at University of Michigan.

Data Infrastructure Building Blocks (DIBBs) is a U.S. National Science Foundation program.

iDigBio, Integrated Digitized Biocollections, is the National Resource funded by the National Science Foundation (NSF) for Advancing Digitization of Biodiversity Collections (ADBC). Through iDigBio, data and images for millions of biological specimens are being curated, connected and made available in electronic format for the biological research community, government agencies, students, educators, and the general public.

Cyber manufacturing is a concept derived from cyber-physical systems (CPS) that refers to a modern manufacturing system that offers an information-transparent environment to facilitate asset management, provide reconfigurability, and maintain productivity. Compared with conventional experience-based management systems, cyber manufacturing provides an evidence-based environment to keep equipment users aware of networked asset status, and transfer raw data into possible risks and actionable information. Driving technologies include design of cyber-physical systems, combination of engineering domain knowledge and computer sciences, as well as information technologies. Among them, mobile applications for manufacturing is an area of specific interest to industries and academia.

Science gateways provide access to advanced resources for science and engineering researchers, educators, and students. Through streamlined, online, user-friendly interfaces, gateways combine a variety of cyberinfrastructure (CI) components in support of a community-specific set of tools, applications, and data collections.: In general, these specialized, shared resources are integrated as a Web portal, mobile app, or a suite of applications. Through science gateways, broad communities of researchers can access diverse resources which can save both time and money for themselves and their institutions. As listed below, functions and resources offered by science gateways include shared equipment and instruments, computational services, advanced software applications, collaboration capabilities, data repositories, and networks.

The Open Knowledgebase of Interatomic Models (OpenKIM). is a cyberinfrastructure funded by the United States National Science Foundation (NSF) focused on improving the reliability and reproducibility of molecular and multi-scale simulations in computational materials science. It includes a repository of interatomic potentials that are exhaustively tested with user-developed integrity tests, tools to help select among existing potentials and develop new ones, extensive metadata on potentials and their developers, and standard integration methods for using interatomic potentials in major simulation codes. OpenKIM is a member of DataCite and provides unique DOIs (Digital object identifier) for all archived content on the site (fitted models, validation tests, etc.) in order to properly document and provide recognition to content contributors. OpenKIM is also an eXtreme Science and Engineering Discovery Environment (XSEDE) Science Gateway, and all content on openkim.org is available under open source licenses in support of the open science initiative.

References

  1. "Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program Summary". National Science Foundation. September 28, 2007. Retrieved October 3, 2007.
  2. "Sustainable Digital Data Preservation and Access Network Partners Program Announcements & Information". National Science Foundation. September 28, 2007. Retrieved October 3, 2007.
  3. William Michener; et al. "DataONE: Observation Network for Earth". www.dataone.org. Retrieved January 19, 2013.
  4. Sayeed Choudhury; et al. "Data Conservancy". dataconservancy.org. Retrieved January 19, 2013.
  5. National Science Foundation. "NSF DataNet Awards". www.nsf.gov. Retrieved January 19, 2013.
  6. Margaret Hedstrom; et al. "SEAD Sustainable Environment - Actionable Data". sead-data.net. Retrieved January 19, 2013.
  7. Reagan Moore; et al. "DataNet Federation Consortium". datafed.org. Retrieved January 19, 2013.
  8. Steven Ruggles; et al. "Terra Populus: Integrated Data on Population and the Environment". terrapop.org. Retrieved January 19, 2013.