A data infrastructure is a digital infrastructure promoting data sharing and consumption.
Similarly to other infrastructures, it is a structure needed for the operation of a society as well as the services and facilities necessary for an economy to function, the data economy in this case.
There is an intense discussion at international level on e-infrastructures and data infrastructure serving scientific work. The European Strategy Forum on Research Infrastructures (ESFRI) presented the first European roadmap for large-scale Research Infrastructures. [1] These are modeled as layered hardware and software systems which support sharing of a wide spectrum of resources, spanning from networks, storage, computing resources, and system-level middleware software, to structured information within collections, archives, and databases. The e-Infrastructure Reflection Group (e-IRG) has proposed a similar vision. In particular, it envisions e-Infrastructures where the principles of global collaboration and shared resources are intended to encompass the sharing needs of all research activities. [2]
In the framework of the Joint Information Systems Committee (JISC) e-infrastructure programme, e-Infrastructures are defined in terms of integration of networks, grids, data centers and collaborative environments, and are intended to include supporting operation centers, service registries, credential delegation services, certificate authorities, training and help desk services. [3] The Cyberinfrastructure programme launched by the US National Science Foundation (NSF) plans to develop new research environments in which advanced computational, collaborative, data acquisition and management services are made available to researchers connected through high-performance networks. [4]
More recently, the vision for “global research data infrastructures” has been drawn by identifying a number of recommendations for developers of future research infrastructures. [5] This vision document highlighted the open issues affecting data infrastructures development – both technical and organizational – and identified future research directions. Besides these initiatives targeting “generic” infrastructures there are others oriented to specific domains, e.g. the European Commission promotes the INSPIRE initiative for an e-Infrastructure oriented to the sharing of content and service resources of European countries in the ambit of geospatial datasets. [6]
E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" and the human digital footprint for the social sciences.
Geoinformatics is a scientific field primarily within the domains of Computer Science and technical geography. It focuses on the programming of applications, spatial data structures, and the analysis of objects and space-time phenomena related to the surface and underneath of Earth and other celestial bodies. The field develops software and web services to model and analyse spatial data, serving the needs of geosciences and related scientific and engineering disciplines. The term is often used interchangeably with Geomatics, although the two have distinct focuses; Geomatics emphasizes acquiring spatial knowledge and leveraging information systems, not their development. At least one publication has claimed the discipline is pure computer science outside the realm of geography.
The Global Earth Observation System of Systems (GEOSS) was built by the Group on Earth Observations (GEO) on the basis of a 10-Year Implementation Plan running from 2005 to 2015. GEOSS seeks to connect the producers of environmental data and decision-support tools with the end users of these products, with the aim of enhancing the relevance of Earth observations to global issues. GEOSS aims to produce a global public infrastructure that generates comprehensive, near-real-time environmental data, information and analyses for a wide range of users. The Secretariat Director of Geoss is Barbara Ryan.
United States federal research funders use the term cyberinfrastructure to describe research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services distributed over the Internet beyond the scope of a single institution. In scientific usage, cyberinfrastructure is a technological and sociological solution to the problem of efficiently connecting laboratories, data, computers, and people with the goal of enabling derivation of novel scientific theories and knowledge.
GeaBios is a free (non-profit) "Slovene Citizen Oriented Information Service", and the name stands for Geo Enabled And Better Internet Oriented Services.
Jisc is a United Kingdom not-for-profit organisation that provides network and IT services and digital resources in support of further and higher education and research, as well as the public sector. Its head office is based in Bristol with offices in London, Manchester, and Oxford. Its current CEO is Heidi Fraser-Krauss, who joined in September 2021 from the University of Sheffield.
Chris Cobb is a British computer scientist and pro vice-chancellor, chief operating officer at the University of London. He has been pro vice-chancellor at University of Roehampton, London, England and prior to that was at London School of Economics. In 2020, he was appointed as chief executive of the Associated Board of the Royal Schools of Music, despite not having any professional background in music.
Digital Earth is the name given to a concept by former US vice president Al Gore in 1998, describing a virtual representation of the Earth that is georeferenced and connected to the world's digital knowledge archives.
Interreg is a series of programmes to stimulate cooperation between regions in and out of the European Union (EU), funded by the European Regional Development Fund. The first Interreg started in 1989. Interreg IV covered the period 2007–2013. Interreg V (2014–2020) covers all 27 EU member states, the EFTA countries, six accession countries and 18 neighbouring countries. It has a budget of EUR 10.1 billion, which represents 2.8% of the total of the European Cohesion Policy budget. Since the non EU countries don't pay EU membership fee, they contribute directly to Interreg, not through ERDF.
A spatial data infrastructure (SDI), also called geospatial data infrastructure, is a data infrastructure implementing a framework of geographic data, metadata, users and tools that are interactively connected in order to use spatial data in an efficient and flexible way. Another definition is "the technology, policies, standards, human resources, and related activities necessary to acquire, process, distribute, use, maintain, and preserve spatial data". Most commonly, institutions with large repositories of geographic data create SDIs to facilitate the sharing of their data with a broader audience.
A geoportal is a type of web portal used to find and access geographic information and associated geographic services via the Internet. Geoportals are important for effective use of geographic information systems (GIS) and a key element of a spatial data infrastructure (SDI).
Renaissance Computing Institute (RENCI) was launched in 2004 as a collaboration involving the State of North Carolina, University of North Carolina at Chapel Hill (UNC-CH), Duke University, and North Carolina State University. RENCI is organizationally structured as a research institute within UNC-CH, and its main campus is located in Chapel Hill, NC, a few miles from the UNC-CH campus. RENCI has engagement centers at UNC-CH, Duke University (Durham), and North Carolina State University (Raleigh).
The Archaeology Data Service (ADS) is an open access digital archive for archaeological research outputs. It is located in The King's Manor, at the University of York. Originally intended to curate digital outputs from archaeological researchers based in the UK's Higher Education sector, the ADS also holds archive material created under the auspices of national and local government as well as in the commercial archaeology sector. The ADS carries out research, most of which focuses on resource discovery, cross-searching and interoperability with other relevant archives in the UK, Europe and the United States of America.
E-Science librarianship refers to a role for librarians in e-Science.
Integrated computational materials engineering (ICME) involves the integration of experimental results, design models, simulations, and other computational data related to a variety of materials used in multiscale engineering and design. Central to the achievement of ICME goals has been the creation of a cyberinfrastructure, a Web-based, collaborative platform which provides the ability to accumulate, organize and disseminate knowledge pertaining to materials science and engineering to facilitate this information being broadly utilized, enhanced, and expanded.
A virtual research environment (VRE) or virtual laboratory is an online system helping researchers collaborate. Features usually include collaboration support, document hosting, and some discipline-specific tools, such as data analysis, visualisation, or simulation management. In some instances, publication management, and teaching tools such as presentations and slides may be included. VREs have become important in fields where research is primarily carried out in teams which span institutions and even countries: the ability to easily share information and research results is valuable.
ELIXIR is an initiative that allows life science laboratories across Europe to share and store their research data as part of an organised network. Its goal is to bring together Europe's research organisations and data centres to help coordinate the collection, quality control and storage of large amounts of biological data produced by life science experiments. ELIXIR aims to ensure that biological data is integrated into a federated system easily accessible by the scientific community.
Data preservation is the act of conserving and maintaining both the safety and integrity of data. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging the existence and authenticity of data and its metadata. Data can be described as the elements or units in which knowledge and information is created, and metadata are the summarizing subsets of the elements of data; or the data about the data. The main goal of data preservation is to protect data from being lost or destroyed and to contribute to the reuse and progression of the data.
Open Science Infrastructure is an information infrastructure that supports the open sharing of scientific productions such as publications, datasets, metadata or code. In November 2021 the Unesco recommendation on Open Science describe it as "shared research infrastructures that are needed to support open science and serve the needs of different communities".