Data infrastructure

Last updated

A data infrastructure is a digital infrastructure promoting data sharing and consumption.

Contents

Similarly to other infrastructures, it is a structure needed for the operation of a society as well as the services and facilities necessary for an economy to function, the data economy in this case.

Background

There is an intense discussion at international level on e-infrastructures and data infrastructure serving scientific work. The European Strategy Forum on Research Infrastructures (ESFRI) presented the first European roadmap for large-scale Research Infrastructures. [1] These are modeled as layered hardware and software systems which support sharing of a wide spectrum of resources, spanning from networks, storage, computing resources, and system-level middleware software, to structured information within collections, archives, and databases. The e-Infrastructure Reflection Group (e-IRG) has proposed a similar vision. In particular, it envisions e-Infrastructures where the principles of global collaboration and shared resources are intended to encompass the sharing needs of all research activities. [2]

In the framework of the Joint Information Systems Committee (JISC) e-infrastructure programme, e-Infrastructures are defined in terms of integration of networks, grids, data centers and collaborative environments, and are intended to include supporting operation centers, service registries, credential delegation services, certificate authorities, training and help desk services. [3] The Cyberinfrastructure programme launched by the US National Science Foundation (NSF) plans to develop new research environments in which advanced computational, collaborative, data acquisition and management services are made available to researchers connected through high-performance networks. [4]

More recently, the vision for “global research data infrastructures” has been drawn by identifying a number of recommendations for developers of future research infrastructures. [5] This vision document highlighted the open issues affecting data infrastructures development – both technical and organizational – and identified future research directions. Besides these initiatives targeting “generic” infrastructures there are others oriented to specific domains, e.g. the European commission promotes the INSPIRE initiative for an e-Infrastructure oriented to the sharing of content and service resources of European countries in the ambit of geospatial datasets. [6]

See also


Related Research Articles

E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" and the human digital footprint for the social sciences.

United States federal research funders use the term cyberinfrastructure to describe research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services distributed over the Internet beyond the scope of a single institution. In scientific usage, cyberinfrastructure is a technological and sociological solution to the problem of efficiently connecting laboratories, data, computers, and people with the goal of enabling derivation of novel scientific theories and knowledge.

Jisc UK non-profit providing expertise in digital technology for higher education institutions

Jisc is a United Kingdom not-for-profit company that provides network and IT services and digital resources in support of higher education institutions and research.

Digital Earth is the name given to a concept by former US vice president Al Gore in 1998, describing a virtual representation of the Earth that is georeferenced and connected to the world's digital knowledge archives.

Geospatial metadata is a type of metadata applicable to geographic data and information. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, services, or related items that exist in some other native environment but whose features may be appropriate to describe in a (geographic) metadata catalog.

A spatial data infrastructure (SDI) is a data infrastructure implementing a framework of geographic data, metadata, users and tools that are interactively connected in order to use spatial data in an efficient and flexible way. Another definition is "the technology, policies, standards, human resources, and related activities necessary to acquire, process, distribute, use, maintain, and preserve spatial data".

The George E. Brown, Jr. Network for Earthquake Engineering Simulation (NEES) was created by the National Science Foundation (NSF) to improve infrastructure design and construction practices to prevent or minimize damage during an earthquake or tsunami. Its headquarters were at Purdue University in West Lafayette, Indiana as part of cooperative agreement #CMMI-0927178, and it ran from 2009 till 2014. The mission of NEES is to accelerate improvements in seismic design and performance by serving as a collaboratory for discovery and innovation.

The Group on Earth Observations (GEO) coordinates international efforts to build a Global Earth Observation System of Systems (GEOSS). It links existing and planned Earth observation systems and supports the development of new ones in cases of perceived gaps in the supply of environment-related information. It aims to construct a global public infrastructure for Earth observations consisting in a flexible and distributed network of systems and content providers.

German National Library of Economics Research library of economics

The National Library of Economics is the world's largest research infrastructure for economic literature, online as well as offline. The ZBW is a member of the Leibniz Association and has been a foundation under public law since 2007. Several times the ZBW received the international LIBER Award for its innovative work in librarianship. The ZBW allows for access of millions of documents and research on economics, partnering with over 40 research institutions to create a connective Open Access portal and social web of research. Through its EconStor and EconBiz, researchers and students have accessed millions of datasets and thousands of articles. The ZBW also edits two journals: Wirtschaftsdienst and Intereconomics.

E-Science librarianship refers to a role for librarians in e-Science.

Integrated computational materials engineering (ICME) involves the integration of experimental results, design models, simulations, and other computational data related to a variety of materials used in multiscale engineering and design. Central to the achievement of ICME goals has been the creation of a cyberinfrastructure, a Web-based, collaborative platform which provides the ability to accumulate, organize and disseminate knowledge pertaining to materials science and engineering to facilitate this information being broadly utilized, enhanced, and expanded.

A virtual research environment (VRE) or virtual laboratory is an online system helping researchers collaborate. Features usually include collaboration support, document hosting, and some discipline-specific tools, such as data analysis, visualisation, or simulation management. In some instances, publication management, and teaching tools such as presentations and slides may be included. VREs have become important in fields where research is primarily carried out in teams which span institutions and even countries: the ability to easily share information and research results is valuable.

iPlant Collaborative

The iPlant Collaborative, renamed Cyverse in 2017, is a virtual organization created by a cooperative agreement funded by the US National Science Foundation (NSF) to create cyberinfrastructure for the plant sciences (botany). The NSF compared cyberinfrastructure to physical infrastructure, "... the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor". In September 2013 it was announced that the National Science Foundation had renewed iPlant's funding for a second 5-year term with an expansion of scope to all non-human life science research.

Systems geology emphasizes the nature of geology as a system – that is, as a set of interacting parts that function as a whole. The systems approach involves study of the linkages or interfaces between the component objects and processes at all levels of detail in order to gain a more comprehensive understanding of the solid Earth. A long-term objective is to provide computational support throughout the cycles of investigation, integrating observation and experiment with modeling and theory, each reinforcing the other. The overall complexity suggests that systems geology must be based on the wider emerging cyberinfrastructure, and should aim to harmonize geological information with Earth system science within the context of the e-science vision of a comprehensive global knowledge system.

The UK Data Service is the largest digital repository for quantitative and qualitative social science and humanities research data in the UK. This national data service integrates and builds on investments the Economic and Social Research Council (ESRC) has made in UK research infrastructure for over 50 years, including the UK Data Archive, Economic and Social Data Service, the Secure Data Service, Census Programme and Survey Question Bank.

ELIXIR is an initiative that will allow life science laboratories across Europe to share and store their research data as part of an organised network. Its goal is to bring together Europe’s research organisations and data centres to help coordinate the collection, quality control and storage of large amounts of biological data produced by life science experiments. ELIXIR aims to ensure that biological data is integrated into a federated system easily accessible by the scientific community.

CLARIN

Common Language Resources and Technology Infrastructure (CLARIN) is a distributed digital infrastructure, with participating institutes all over Europe, such as universities, research centres, libraries and public archives. The participating organizations have in common that they provide access to digital language data collections, to digital tools, and training material for researchers to work with the resources.

CyberGIS, or cyber geographic information science and systems, is an interdisciplinary field combining cyberinfrastructure, e-science, and geographic information science and systems (GIS). CyberGIS has a particular focus on computational and data-intensive geospatial problem-solving within various research and education domains. The need for GIS has extended beyond traditional forms of geographic analysis and study, which includes adapting to new sources and kinds of data, high-performance computing resources, and online platforms based on existing and emerging information networks. The name cyberGIS first appeared in Geographic Information Science literature in 2010. CyberGIS is characterized as digital geospatial ecosystems. These systems are developed and have evolved through heterogeneous computing environments, as well as human communication and information environments. CyberGIS can be considered a new generation of geographic information systems (GIS). These systems are based on advanced computing and information infrastructure, which analyze and model geospatial data, providing computationally intensive spatial analysis, modeling, and collaborative geospatial problem-solving at previously unprecedented scales.

Data preservation is the act of conserving and maintaining both the safety and integrity of data. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging the existence and authenticity of data and its metadata. Data can be described as the elements or units in which knowledge and information is created, and metadata are the summarizing subsets of the elements of data; or the data about the data. The main goal of data preservation is to protect data from being lost or destroyed and to contribute to the reuse and progression of the data.

References

  1. European Strategy Forum on Research Infrastructures. (2010). Strategy Report on Research Infrastructures. Publications Office of the European Union. http://ec.europa.eu/research/infrastructures/index_en.cfm?pg=esfri
  2. e-Infrastructure Reflection Group. (2010). Blue Paper. E-IRG
  3. Joint Information Systems Committee. (2006). e-Infrastructure Briefing Paper. JISC
  4. "Cyberinfrastructure Vision for the 21st Century Discovery" (PDF). National Science Foundation . Cyberinfrastructure Council. 2007.
  5. Thanos, C. (2011). "Global Research Data Infrastructures: The GRDI2020 Vision" (PDF).
  6. European Parliament, Council. (2007, 3 14). Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE).
  7. D4Science Website http://www.d4science.org/
  8. OpenAIRE Website http://www.openaire.eu/